Subject : ½Ã½ºÅÛÀå¾ÖºÐ¼®(savecore,hangup,Panic,watchdog-reset)
Description :
1. Setup savecore ( 1.X and 2.X )
2. Hangup
3. Panic
4. Watchdog Reset
< 1. Setup savecore >
1. Solaris 1.X : How to setup savecore
1) Customizing /etc/rc.local
....
# Default is to not do a savecore
#
# mkdir -p /var/crash/`hostname`
# echo -n 'checking for crash dump... '
# intr savecore /var/crash/`hostname`
# echo '
2) Default is to not do a savecore.
# Default is to not do a savecore
#
mkdir -p /var/crash/`hostname`
echo -n 'checking for crash dump... '
intr savecore /var/crash/`hostname`
echo '
3) -p option of mkdir says to create the parent directories if they
don't already exist.
4) Configuring a special dump device
- ¿ì¸®´Â ¹ú½á dump device ¿¡ ´ëÇØ¼ À̾߱âÇß°í primary swap device °¡ º¸Åë
dump device ·Î »ç¿ëµÈ´Ù´Â°ÍÀ» ¾Ë°í ÀÖ´Ù.
cf) config vmunix swap on sd1b
config vmunix swap on dumps on sd2f
2. Solaris 2.X : How to setup savecore
1) Customizing /etc/rc2.d/S20sysetup
......
##
## Default is to not do a savecore
##
if [ ! -d /home/lsh/crash/`uname -n` ]
then mkdir -p /home/lsh/crash/`uname -n`
fi
echo 'checking for crash dump...\c '
savecore /home/lsh/crash/`uname -n`
echo ''
....
2) Displaying the dumpfile kernel variable via adb
hyundai3# adb -k /dev/ksyms /dev/mem
physmem 3e1a
dumpfile/20X
dumpfile:
dumpfile: 0 0 0 0
2f646576 2f64736b 2f633074 33643073
31000000 0 0 0
0 0 0 0
0 0 0 0
dumpfile+10/X
dumpfile+0x10: 2f646576
dumpfile+10/s
dumpfile+0x10: /dev/dsk/c0t3d0s1
$q
< 2. System Hangup>
1. What is a system hang ?
- System hangs ´Â system admin ¿¡°Ô´Â Ä¿´Ù¶õ ÁÂÀýÀÌ µÉ¼ö°¡ ÀÖ´Ù.
Àá½Ãµ¿ÇÑ ¸ðµç sysetm admin Àº ÇϳªÀÇ ½Ã½ºÅÛÀ» º¸°í ±×°ÍÀÌ »ì¾ÆÀÖ°í
Á×°í, »ó´çÈ÷ ¼Óµµ°¡ ´Ê¾îÁö´Â°ÍÀ» º¸°ÔµÇ°í ¾ó¸¶ÈÄ "hung" system À»
º¸°ÔµÈ´Ù. System hang Àº ¸Å¿ì ´Ù¾çÇÑ Á¾·ùÀÇ ¿øÀÎÀ» °¡Áö°í ÀÖÁö¸¸ ±×µéÀº
ÇѰ¡Áö °øÅëÀûÀΠ¡Èĸ¦ µå·¯³½´Ù. ½Ã½ºÅÛÀº ´õÀÌ»ó ¿ÏÀüÇÏ°Ô »ç¿ëµÇÁö¾Ê´Â´Ù.
Ç×»ó ½Ã½ºÅÛÀÌ ¿ÏÀüÇÏ°Ô »ç¿ëµÉ¼ö°¡ ¾ø°ÔµÇ´Â panics °ú´Â ´Þ¸® system hang Àº
system resources ¸¦ õõÈ÷ Àâ¾Æ¸Ô¾î ¸¶Ä§³» ¿ÏÀüÇÏ°Ô useless system ÀÌ µÈ´Ù.
- kernel errors ¸¦ º¼¶§¿¡ ´ç½ÅÀº ¸ðµç½Ã½ºÅÛÀÌ core dump ·Î½á panic À» À¯¹ßÇÏ´Â
¹®Á¦¸¦ ÀÏÀ¸Å°Áö´Â ¾Ê´Â´Ù´Â°ÍÀ» ¾Ë°ÔµÉ°ÍÀÌ´Ù. °¡²ûÀº ½Ã½ºÅÛµéÀº hangup À̵ǰí
¿ì¸®´Â memeory ÀÇ ³»¿ëÀ» Á¶»çÇϱâÀ§ÇÏ¿© core dump ¸¦ ÀÏÀ¸ÄѼ hang À» ¸¸µé°ÔµÈ
¿øÀÎÀ» ¾Ë¾Æº¸´Â°ÍÀÌ ¹Ù¶øÁ÷ÇÏ´Ù.
2. What conditions cause hangs ?
- system hang ÀÇ ÀϹÝÀûÀÎ ¿øÀÎÀº deadlock ¶Ç´Â ÇϳªÀÇ process °¡ ´Ù¸¥ process
¿¡ ÀÇÇØ lock µÇ¾îÀÖ´Â ¹«¾ùÀΰ¡¸¦ waiting ÇÏ¸ç ´Ù¸¥ process ´Â óÀ½ process
°¡ lock ÇØ³õÀº resource ¸¦ ±â´Ù¸®´Â »óȲÀÌ´Ù.
- System hangs can also occur when resources dry up and the system has to sit
around waiting for more resources before it can continue doing what was
asked of it.
- °¡²û, system Àº hardware problems ¿¡ ÀÇÇØ hang ÀÌ µÈ´Ù. ¿¹¸¦ µé¸é,
µð½ºÅ© µå¶óÀ̹ö¿¡ ºÙ¾îÀÖ´Â data transfer cable ÀÇ ¹®Á¦´Â system °ú
disk driver »çÀÌÀÇ communication problem À» ÀÏÀ¸Å²´Ù.
±× °á°ú´Â bus ¸¦ hung ÇÏ°Ô ¸¸µç´Ù.
- application ÀÌ loop ¿¡ ºüÁ® hangup ÀÌ µÇ¾úÀ»¶§¿¡ ±× ½Ã½ºÅÛÀÇ ´Ù¸¥ user ´Â
¿µÇâÀ» ¹ÞÁö¾Ê´Â´Ù. Áï, ±× process µ¿¾È¿¡ disk ¸¦ ¸Ô°Å³ª µÎ°³ ¶Ç´Â ´Ù¸¥
kenel resource ¸¦ ¸ÔÁö¸¸ ¾Ê´Â´Ù¸é hung program ÀÌ ±× system ÀÇ ³ª¸ÓÁö¿¡°Ô
¿µÇâÀ» ÁÖÁö¾Ê´Â´Ù.
- Hangs Àº ´Ù¾çÇÑ Á¶°Ç¿¡ÀÇÇØ¼ ¹ß»ýÇÏ¸ç ¼·Î´Ù¸¥ Ư¼ºÀ» °¡Áö°í ÀÖ´Ù.
* ¿ì¼± ½Ã½ºÅÛÀº hangup µÇ¾îÀÖ´Â ½Ã½ºÅÛÀ¸·ÎºÎÅÍ ÇϳªÀÇ low-level ICMP request
¸¦ º¸³»°ÔÇÏ´Â ¸í·É¾îÀÎ ping ¿¡µµ ÀÀ´äÀ» ÇÏÁö¾Ê´Â ½Ã½ºÅÛÀÌ ÀÖÀ»¼ö ÀÖ´Ù.
¸¸¾à¿¡ ÀÀ´äÀ» ÇÑ´Ù¸é kernel Àº ±× ¼ø°£¿¡µµ network interrupts ¿¡ ´ëÇØ ÃæºÐÈ÷
ÀνÄÇϰí ÀÀ´äÀ» ÇÒ¼ö°¡ Àִٴ°ÍÀÌ´Ù.
* ½Ã½ºÅÛÀº keyboard ÀÇ characters ¿¡ echo ¼Ò¸®¸¸ ³»°Å³ª mouse movements ´Â
ÀÖÁö¸¸ ÀԷµǴ command ³ª abort sequence ¿¡ Á¶Â÷µµ ÀÀ´äÀ» ÇÏÁö ¾Ê´Â°æ¿ì°¡
ÀÖ´Ù. À̰ÍÀº process °¡ °è¼Ó¼öÇàÀü¿¡ resources ¿¡»õÇØ availabelÇϰԵDZ⸦
±â´Ù¸®´Â »óȲ , Áï deadlocks ¿¡ ÀÇÇÑ hang up Àϼö ÀÖ´Ù. À̰æ¿ì¿¡´Â ±×
process µéÀº °áÄÚ ready »óŰ¡ µÇÁö¾Ê´Â´Ù. ps ÀÇ output Àº ¾Æ¸¶ D wait
state ¿¡ ¸¹Àº process ¸¦ º¼¼ö°¡ ÀÖÀ»°ÍÀÌ´Ù.
* ¸¸¾à¿¡ keyboard ÀÇ echo Á¶Â÷µµ ÀüÇô¾ø´Â ¿Ïº®ÇÑ hangup Àΰæ¿ì´Â ¾Æ¸¶
STREAMS problems Àϼö°¡ ÀÖ´Ù. °¡²û L1-A Á¶Â÷µµ ÀÌ °æ¿ì¿¡´Â ¼Ò¿ëÀÌ ¾ø´Ù.
* Server systems ¿¡¼´Â CPU B/D »óÀÇ LEDs °¡ ±× ½Ã½ºÅÛÀÇ »óŸ¦ ³ªÅ¸³½´Ù.
Á¤»óÀûÀÎ °æ¿ì´Â bounce ¶Ç´Â regular moving light ÀÌ´Ù. ¸¸¾à¿¡
µ¿ÀÛÀº ÇÏÁö¸¸ ¸Å¿ì ¼Óµµ°¡ ´ÊÀ»¶§¿¡´Â ±× ½Ã½ºÅÛÀº ¸Å¿ì busy »óÅÂÀÌ´Ù.
À̰ÍÀº kernel ÀÌ loop À̰ųª Çϳª ¶Ç´Â ±× ÀÌ»óÀÇ modem lines °ú °°Àº
external device ·ÎºÎÅÍÀÇ ´ë·®ÀÇ interrupt ¶§¹®ÀÌ´Ù.
Frozon lights ´Â H/W problem À» ³ªÅ¸³½´Ù.
3. Capturing system hang information
- ´ëºÎºÐÀÇ °æ¿ì hung system ÀÇ crash dump ´Â °Á¦ÀûÀϼö°¡ ÀÖ´Ù. ±×·¯³ª À̰ÍÀº
¸ðµç system hang conditions ¿¡ ´ëÇØ not guaranteed.
°Á¦ÀûÀ¸·Î dump ¸¦ ÇÏ·Á¸é, ´ç½ÅÀº boot PROM monitor ·Î ³»·Á¾ß ÇÑ´Ù.
Suspending all current program execution. It`s L1-A.
On systems using ASCII terminals for the console, usually the Break key can
be used to get to the boot PROM monitor.
- ¸ðµç hang situations ÀÌ interrupted µÇÁö´Â ¾Ê´Â´Ù. ¸¸¾à, L1-A °¡ ÀÛµ¿À» ÇÏÁö
¾Ê´Â´Ù¸é °¡²û console keyboard ¸¦ »Ì°Å³ª ¸îºÐµ¿¾È terminal À» »Ì´Â´Ù.
ÀÌ ¸ðµç°ÍÀÌ ½ÇÆÐ·Î µ¹¾Æ°¡¸é ½Ã½ºÅÛÀ» power down Çϴ¼ö¹Û¿¡ ¾ø´Ù.
4. Sun-4d
- psrinfo (print processor info) ¿Í psradm (processor admin) command ´Â
status display ¿Í multiprocessor system ÀÇ control ¿¡ À¯¿ëÇÔ.
- sun4d system ( SPARCserver 1000, SPARCcenter 2000) Àº ½Ã½ºÅÛÁø´Ü¿¡ À¯¿ëÇÑ
Ưº°ÇÑ H/W Ư¼º¿Ü¿¡ prtdiag ¶ó´Â »õ·Î¿î command °¡ ÀÖ´Ù.
- µÎ°³ÀÇ ¼·Î´Ù¸¥ Á¾·ùÀÇ watchdog reset ÀÌ ÀÖÀ¸³ª º¸Åë H/W problem À» ³ªÅ¸³¿.
½Ã½ºÅÛÀÇ watch dog reset Àº º¸Åë H/W error ¿¡ ÀÇÇϹǷΠ½Ã½ºÅÛÀ» reset ½ÃÅ´.
- POST routines Àº watchdog reset ¿¡ °üÇÑ information À» ÀúÀåÇϹǷÎ
prtdiag -v ¶ó´Â command ·Î½á È®ÀÎ ÇÒ¼ö°¡ ÀÖ´Ù.
- A local CPU watchdog reset occurs when a single processor is reset due to
a trap occuring when traps are disabled ( a "standard" watchdog).
The system drops into the OBP.
< 3. Panic >
1.What happened ?
- Computers crash. It's just a fact of life.
Depending on the H/W and S/W. ÀϺδ ÀÚÁֹ߻ýÇϰí ÀϺδ ÀüÇô¹ß»ýÇÏÁö ¾Ê´Â´Ù.
- UNIX °¡ Á¸ÀçÇÑÀÌ·¡·Î UNIX system crash dump ¸¦ ºÐ¼®ÇÏ·Á´Â »ç¶÷ÀÌ ¸¹°í
ÀÌ »ç¶÷µéÀº UNIX system ÀÌ crash ÈÄÀÇ ¸¸µé¾îÁø files À¸·ÎºÎÅÍ ¿øÀÎÀ» ºÐ¼®ÇÒ
¼ö ÀÖ°Ô µÇ¾ú´Ù.
2. What is a system crash ?
- UNIX ¿¡ µû¸£¸é 1970 1 ¿ù 1 ÀÏ ÀÚÁ¤À¸·ÎºÎÅÍ computer systems Àº crash °¡ ¹ß»ý.
- System crash ´Â Á¾Á¾ ´ÙÀ½°ú °°Àº Á¶°Ç¿¡¼ °©ÀÚ±â system ÀÌ »ç¿ëÇÒ¼ö ¾ø°ÔµÊ.
( System panics & bad traps, Watchdog resets, Dropping out to boot PROM)
3. What conditions cause panics ?
- ¾î¶²ÀÌ´Â panics À» Çø¿ÀÇÑ´Ù. ±×µéÀº ¾Æ¸¶ ½Ã½ºÅÛ°ú data integrity ¸¦
¾ÈÀüÀåÄ¡(safeguards) ·Î »ý°¢ÇÏ´Â°Í °°´Ù.
- ½Ã½ºÅÛ panic messages ´Â µÎ°¡ÁöÁßÀÇ ÇѰ¡ÁöÀÇ ¿øÀÎÀÌ´Ù.
software consistency check, hardware fault.
- ÈǸ¢ÇÑ O/S programmer ´Â system resources ÀÇ integrity ÀÇ checking À» ÇÒ¶§¿¡
±× code ³»¿¡ panic() routine À» ³¢¿ö³Ö¾î referencing °ú manipulating À» ÇÑ´Ù.
¿¹¸¦µé¸é, ½Ã½ºÅÛ ÇÁ·Î±×·¡¸Ó ÀÇ program code ¿¡¼ Áö±ÝÇöÀç »ç¿ëÁßÀ̶ó°í
¾Ë·ÁÁø(marking) disk ÀÇ ÇÑ block À» ÀÌÁ¦ ¸· free up ½ÃŰ·Á°í ÇÒ¶§¿¡ ±×´Â
¸ÕÀú ±× µð½ºÅ©°¡ ¾ÆÁ÷µµ »ç¿ëÁßÀΰÍÀ¸·Î mark µÇ¾îÀÖ´ÂÁö¸¦ °ËÁõÇÒ°ÍÀÌ´Ù.
¸¸¾à ±× block ÀÌ °©Àڱ⠱װ¡ free ÇϱâÀü¿¡ free µÈ°ÍÀ¸·Î mark µÇ¾îÀÖ°í ±×°ÍÀ»
¾Ë¾ÒÀ»¶§ ±×ÀÇ code ´Â ±×°ÍÀ» freeing ÇÏ¸é ¾ÈµÈ´Ù. ±×·¯³ª ¾î¶»°Ô ±× block ÀÌ
¿ä¼úó·³ free µÇ¾úÀ»±î? ¾î¶»°Ô , ¾îµð¿¡¼, ¹«¾ùÀÌ ¾öû³ª°Ô À߸øµÇ¾ú´Â°¡?
À̶§ panic() À» call ÇÏ¸é¼ system programmer ´Â ±× system À» °©ÀÚ±â ÁßÁö½Ãų
¼ö ÀÖÀ¸¸ç ÀÌ·¸°Ô ÇÔÀ¸·Î½á ½Ã½ºÅÛÀ» º¸È£ÇÏ°í ±× problem ÀÌ ¹ß°ßµÉ¶§±îÁö
Ãß°¡ÀûÀÎ corruption À» ¿¹¹æÇÑ´Ù.
- panic() Àº ¿ÀÁ÷ O/S °¡ kernel mode ¿¡ ÀÖÀ»¶§¸¸ call µÈ´Ù.±×·¯³ª O/S ¿¡ ÀÖ¾î¼
bug ¸¦ ½ÇÇèÇÏ´Â ¾î¶°ÇÑ program ÀÌ¶óµµ panic À» ÀÏÀ¸Å³¼ö°¡ ÀÖ´Ù. ¿¹¸¦µé¸é,
debuggin ÁßÀÎ »õ·Î¿î device driver ¸¦ »ç¿ëÇÏ´Â user program ¿¡¼ driver °¡
»ç¿ëµÉ¶§¸¶´Ù kernel mode ·Î ¿òÁ÷À̰ԵȴÙ. Çѹø kernel mode ¿¡ ÀְԵǸé,
panics Àº ÀϾ¼ö°¡ ÀÖ´Ù. ±×ÀÇ program ÀÌ panic À» ÀÏÀ¸Å² °ÍÀº ±× user ¿¡°Ô
³ªÅ¸³ª°ÔµÇÁö¸¸ ½ÇÁ¦ ±×ÀÇ ÇÁ·Î±×·¥Àº ´ÜÁö panic À¸·Î À¯µµÇÏ°Ô µÇ´Â events ÀÇ
trigger °¡ µÈ°ÍÀÌ´Ù. Áï °£´ÜÈ÷ ¸»Çϸé, ¸¸¾à ½Ã½ºÅÛÀÌ panics ÀÌ ³ª¸é
¹Ù·Î ½Ã½ºÅÛÀÌ data ÀÇ integrity or data ÀÇ corruption ÀÌ ÀǽɵǴ Á¶°ÇÀ»
°¨ÁöÇß°¡ ¶§¹®ÀÌ´Ù.
- data integrity concept À» user level programming ÀÇ °üÁ¡¿¡¼ »ìÆìº¸ÀÚ.
¸¸¾à ´ç½ÅÀÌ ÇϳªÀÇ ÈÀÏÀ» open ÇÏ´Â ÇÁ·Î±×·¥À» open() system call À» »ç¿ëÇÏ¿©
ÇÁ·Î±×·¡¹ÖÇÑ´Ù¸é, ´ç½ÅÀº ¾Æ¸¶µµ ´ÙÀ½ ´Ü°è¸¦ ³Ñ¾î°¡±âÀü¿¡ ½ÇÁ¦·Î open ÀÌ ¼º°ø
Çߴ°¡¸¦ open() status ¸¦ check ÇÒ°ÍÀÌ´Ù.¸¸¾à open() status °¡ fail À̸é
´ç½ÅÀÇ program Àº ¾Æ¸¶ ÀÌ °ÍÀ» report Çϰí exit Çϰųª »õ·Î¿î file name À»
À§ÇØ prompt ¸¦ ³»°Å³ª °£´ÜÈ÷ ´ÙÀ½ course ÀÇ action À» ÃëÇÒ°ÍÀÌ´Ù. ¿©±â¼
¸¸¾à ´ç½ÅÀÌ open() system call ·ÎºÎÅÍ ³Ñ¾î¿Â status ¸¦ ¹«½ÃÇÑ´Ù¸é ÇâÈÄ¿¡ ÀÌ
line ¿¡ ¿Í¼´Â ¾î¶°ÇÑ ÀáÀçÀûÀÎ ¹®Á¦¿¡ ºÎµúÈú°ÍÀÌ´Ù. ´ç½ÅÀÇ data integrity ´Â
À§Çè¿¡ ³õÀϰÍÀÌ´Ù.
- ´ç½ÅÀÌ ¿îÀüÇÏ´Â ÀÚµ¿Â÷ ´Â panic() routine °ú ºñ½ÁÇÑ ¾î¶²°ÍÀ» °¡Áö´Â°¡ ?
¸¸¾à air bag ÀÌ ÀåÂøµÇ¾î ÀÖ´Ù¸é ´äÀº yes ÀÌ´Ù. ´ç½ÅÀÇ Â÷°¡ °©ÀÚ±â ¾Õ ¹üÆÛ°¡
high-speed collision °ú °°Àº°ÍÀ» °¨ÁöÇß´Ù¸é, air bag ÀÌ ºÎÇ®·¯Á®¼ ¿îÀüÀÚ¸¦
º¸È£ÇÏ°Ô µÉ°ÍÀÌ´Ù.
- Software(Kernel) ´Â ¼ö¸¹Àº hardcoded validity tests ¸¦ Æ÷ÇÔÇϰí Àִµ¥,
À̰ÍÀº invalid pointers ¶Ç´Â impossible conditions before continuing À»
checking ÇϰԵȴÙ. panics Àº µÎ°¡Áö types Áß¿¡¼ ÇѰ¡Áö°¡ µÉ¼öÀÖ´Ù.
a regular panic messages, or an assertion ÀÌ´Ù.
- ÀÌÀüºÎÅÍÀÇ panic messages ¿¡ ´ëÇØ¼´Â ´ç½ÅÀÌ º¸Åë ¾òÀ»¼ö Àִ°ÍÀº
messages ±× ÀÚüÀÌ´Ù. À̰͵éÀº unique ÇÑ ±× ÀÚüÀ̸ç Á¤È®È÷ ±× ¹®Á¦¸¦
³ªÅ¸³» ÁØ´Ù. ´ç½ÅÀº source code ³»¿¡¼ ±×°ÍÀ» Çѹø º¼¼ö°¡ ÀÖ´Ù.
- Assertion messages ´Â "panic: assertion failed" ¶ó´Â messages ¿¡ À̾î¼
erroneous conditionÀ» ³ªÅ¸³»´Â messages ¸¦ console ¿¡ prints ÇÏ´Â
macro ·Î ºÎÅÍ À¯·¡ÇÑ´Ù. ÀÌ °æ¿ì¿¡, °ü½ÉÀÖ´Â article Àº panic: ¿¡ ¼±ÇàÇÏ´Â
condition message À̸ç À̰ÍÀº test, file, ±×¸®°í ±× code ³»¿¡ line number
¸¦ ³ªÅ¸³½´Ù.
- °©ÀÛ½º·± hardware traps Àº panics À» ÀÏÀ¸Å²´Ù. À̰ÍÀº ÀϹÝÀûÀ¸·Î
kernel ·Î ºÎÅÍÀÇ invalid address °¡ access µÇ´Â °æ¿ìÀÌ´Ù.¿Ö³ÄÇϸé OS ´Â
page µÇ´Â°ÍÀÌ ¾Æ´Ï¹Ç·Î kernel code ·Î ºÎÅÍÀÇ fault ´Â Áï°¢ÀûÀÎ Á×À½(immediate
death) ÀÇ ¿øÀÎÀÌ´Ù. software panic messages ¿Í ´Þ¸® hardware traps Àº Á¤È®ÇÑ
½Ã½ºÅÛÀÇ »óŸ¦ ³ªÅ¸³»¸ç console ¿¡ print µÇ´Â traceback À¸·Î ±Í°áµÈ´Ù.
À̰ÍÀº º¸Åë ¶ÇÇÑ /var/adm/messages file ¿¡ ³ªÅ¸³ª°Ô µÈ´Ù.
- º¸Åë panics ´Â hardware-related or detected fault ¸¦ ³ªÅ¸³½´Ù.
Á¾·ù´Â.
- trap : for any unexpected trap into or from kernel mode
- bus error(Sun-3) : a kernle segmentation violation.
- text fault : an attempt to fetch an instruction from a bad place.
- data fault: generally an erroneous pointer
- address alignment: also generally a bad pointer.
- illegal instruction : possibly an attempt to execute "data"
4. A word about bad traps
- Computer system Àº H/W ¿¡¼ ÀϾÁö¸»¾Æ¾ß ÇÒ Á¶°ÇÀÌ °¨ÁöµÈ´Ù¸é ¶ÇÇÑ crash
¸¦ ³½´Ù. UNIX system¿¡¼ ÀÌ·¯ÇÑ Á¾·ùÀÇ crash ¸¦ "bad trap " À̶ó°íÇϸç
system admin ÀÇ °üÁ¡¿¡¼ º»´Ù¸é bad traps °ú S/W panics ´Â µ¿ÀÏÇÑ ¹æ¹ýÀ¸·Î
´Ù·ç¾îÁ®¾ß ÇÑ´Ù. UNIX systems Àº ÇÏ·ç¿¡ ¼ö¹é¸¸ÀÇ traps À» ¼öÇàÇѤ§.
±×·¡¼ ´ç½ÅÀÌ trap À» µè°ÔµÈ´Ù¸é panic À̶ó°í ÇÏÁö¸»¶ó. ±×·¯³ª µå¹®°æ¿ì¿¡
´ç½ÅÀº bad trap À» ¸¸³¯¼ö°¡ ÀÖ´Ù. ´ç½ÅÀÇ UNIX system ÀÌ ±×·¸´Ù¸é ±×°ÍÀº
panic() À» invoke ÇÒ°ÍÀÌ´Ù.
- SPARC terms ¿¡ ÀÖ¾î¼ trap À̶ó´Â°ÍÀº kernel code ·ÎÀÇ Áï°¢ÀûÀÎ ºÐ±â¸¦
ÀÏÀ¸Å²´Ù. Áï Á¤»óÀûÀÎ instructions ÀÇ ¼öÇàÀ» Áß´Ü(interruption).
ÀÌ·¯ÇÑ interruptionÀº user request(a system call) ¶Ç´Â ÀϺÎexternal
event ( a page fault, a disk interrupt, a keystroke) °¡ ¿øÀÎÀÌ µÉ¼öÀÖ´Ù.
¾î¶² °æ¿ì¿¡µµ interrupt ´Â H/W ¿Í very low-level sofrware ¿¡ ÀÇÇØ
processing µÈ´Ù. ±×·¡¼ ¾î¶»°Ô traps ÀÌ ¼öÇàµÇ°í ¾î¶»°Ô 󸮵ǴÂÁö¿¡ ´ëÇÑ
°ÍÀº ±× ½Ã½ºÅÛÀÇ architecure ¸¦ ÀÌÇØÇØ¾ßÇÑ´Ù.
CPU H/W ´Â trap ÀÇ type À» ÀνÄÇÏ°í ±×°ÍÀ» ó¸®ÇϱâÀ§ÇØ Á¤È®ÇÑ À§Ä¡¸¦
¾Ë·Á°í ½ÃµµÇÑ´Ù. kernel Àº Àû´çÇÑ trap handling code °¡ ¹ÌÄ¥¼ö ÀÖµµ·Ï
È®½ÇÈ÷ ÇϱâÀ§ÇØ ¸î°³ÀÇ control registers ¸¦ setup ÇØ¾ß¸¸ ÇÑ´Ù.
Çѹø ½Ã½ºÅÛÀÌ ±¸µ¿µÇ°í user processes °¡ running µÇ¸é, ÇϳªÀÇ trap Àº
kernel ÀÌ ÇϳªÀÇ user program À¸·ÎºÎÅÍ control À» °®°ÔµÉ À¯ÀÏÇÑ ¹æ¹ýÀ̵ȴÙ.
trap À̶ó´Â°ÍÀº ÇϳªÀÇ user request °¡ processµÇ°í ( kernel Àº user program
À§¿¡¼ running) ÇϳªÀÇ device °¡ control(kernel Àº ¸î°³ÀÇ external request
¶§¹®¿¡ running) µÇ´Â ¼ö´Ü(means) ÀÌ´Ù.
5. Kinds of traps
- µÎ°³ÀÇ ±âº»ÀûÀÎ trap ÀÌ ÀϾ¼ö°¡ Àִµ¥ synchronous ¿Í asynchronous ÀÌ´Ù.
Synchronous trap Àº opeation À̰ųª instruction Áß¿¡ÀÇÇØ ¹ß»ýÇÒ¼öÀÖ´Ù.
À̰ÍÀº ½ÇÁ¦ trap instruction ÀÌ µÉ¼öµµ ÀÖ°í ¶Ç´Â bad address alignment,
bad address(bus timeouts), illegal instructions, floating-point coprocessor
error °°Àº H/W error Àϼöµµ ÀÖ´Ù. ÀÌ·¯ÇÑ traps Àº Áï½Ã ¹Þ¾Æµé¿©Áø´Ù.
Áï, H/W ´Â kernel space À» À§ÇØ H/W ÀÇ tracks °ú heads ³»ÀÇ ÇöÀç instruction
ÀÇ operation À» ÁßÁö½ÃŲ´Ù.
- Asynchronous trap Àº processor ¿¡¼ ¾î¶²»óŸ¦ º¯°æÇϱâÀü¿¡ ¹ß»ýÇÑ´Ù.
À̸®ÇÏ¿© ±× trap ÀÌ º¹±¸°¡´ÉÇÑ H/W fault ¿¡ ÀÇÇØ ÀϾÀ»¶§¿¡´Â ±×
instruction Àº Çѹø ±× trap handling ÀÌ ³¡³µÀ»¶§ ±× ¹®Á¦·ÎºÎÅÍ recovery
ÇϱâÀ§ÇØ restart ÇÑ´Ù. page faults ´Â ÁÁÀº¿¹ÀÌ´Ù.
Asynchronous trap Àº ¾ðÁ¦³ª request µÉ¼ö°¡ ÀÖÀ¸¸ç ÇϳªÀÇ instruction ÀÌ
¿ÏÀüÈ÷ ³¡³µÀ»°æ¿ì¿¡¸¸ processing µÉ¼ö°¡ ÀÖ´Ù.
ÀÌ·¯ÇÑ traps Àº interrupts ¿Í °°Àº external events ¿¡ ÀÇÇØ ÀϾ.
ÀÌ traps Àº instruction ÀÇ operation ¿¡´Â ¿µÇâÀ» ¹ÌÄ¡Áö ¾ÊÀ¸¸ç ´ÜÁö
instruction stream ¿¡¼ÀÇ break(ºÐ±â) ¸¦ ÀÏÀ¸Å²´Ù. À̰ÍÀº ¸¶Ä¡
kernel ¿¡ÀÇ subroutine call ÀÌ kernel ³»¿¡ ´«¿¡ º¸ÀÌÁö ¾Ê°Ô ½É¾îÁ® Àִ°Í
°ú °°´Ù.
- µÎ°¡Áötraps ÀüºÎ user program °ú kernel ³»ºÎ¿¡¼ ¼öÇàµÉ¼ö°¡ ÀÖ´Ù.
µÑ´Ù switch ¸¦ kernle ¶Ç´Â supervisor mode ·Î ºÐ±â½Ãų¼ö°¡ ÀÖ°í kernel trap
code ·Î controle À» transfer ÇÏ¸ç ¿©±â¼ software °¡ ±×°Í¿¡´ëÇØ ÇÒÀÏÀ» °áÁ¤.
À̸®ÇÏ¿© user program À¸·ÎºÎÅÍÀÇ page fault ´Â ÀϹÝÀûÀ¸·Î acceptable Çϸç
kernel Àº Àû´çÇÑ page ¸¦ load ÇÒ°ÍÀ̸ç instruction À» °è¼ÓÇϰÔÇÑ´Ù.
kernel ·Î ºÎÅÍÀÇ page fault ´Â ±×·¯³ª bad news À̰í trap code ´Â panic À¸·Î¼
stop ÇÏ°Ô µÈ´Ù.
6. Trap sequence
- H/W ´Â ±× trap ÀÌ synchronous fault ¶Ç´Â asynchronous interrupt ÀÌ´ø°£¿¡
operation ÀÇ ÇÑ sequence ¸¦ ¼öÇàÇÑ´Ù.
interrupt requests, page faults, illegal instructions, or system calls ˼
¸ðµÎ µ¿ÀÏÇÑ ¹æ¹ýÀ¸·Î handling µÈ´Ù.
trap recognition sequence ´Â kernel ¿¡°Ô control À» Àü´ÞÇϰí kernel ¶Ç´Â
supervisor mode ·Î trap ÀÌ ¹ß»ýÇÑ °÷°ú trap ÀÇ Á¾·ù¿¡ °üÇØ¼ save µÈ
information À» °¡Áö°í µé¾î°£´Ù.
- trap sequence as performed by the H/W looks like:
1) Recognize the trap
2) Get to a new window ( an implicit save instruction)
3) Set TBR according to the trap type
4) Force a branch to the trap instructions. - the address in the TBR
- Enable Traps bit ¸¦ turning off Çϴ°ÍÀº interrupt recognitionÀ»
delay ½Ã۱⠶§¹®¿¡ °¡´ÉÇϸé ÃÖ´ëÇÑ Âª°Ô ÇØ¾ßÇÏ¸ç ±× code ´Â ¸Å¿ì ÁÖÀÇ
ÇÏ¿© writing µÇ¾î¾ßÇÏ¸ç ¸¸¾à ÇϳªÀÇ trap ÀÌ disalble µÇ¾úÀ»¶§¿¡ ¿äûµÇ¸é
watchdog ÀÌ ÀϾ°ÍÀÌ´Ù.
- current window pointer(CWP, in the Processor Status Register) ´Â ÇöÀç
»ç¿ëµÇ°í ÀÖ´Â register ¸¦ °¡¸®Å²´Ù. registers ´Â circular buffer ó·³
ÇൿÇϹǷΠ¿ÏÀüÇÑ register set À» ÅëÇÏ¿© ¿øÇüÀ¸·Î µ¹°ÔµÈ´Ù.
°ð ±×°ÍÀº overlap À̵ǰí new register window °¡ °¡¸®Å°´Â°ÍÀº ½ÇÁ¦·Î
»ç¿ëÇϱâÀ§ÇÑ free °¡ ¾Æ´Ï´Ù. ÀÌ·¯ÇÑ °æ¿ì°¡ ¹Ù·Î window overflow trap(or
a window underflow,when moving in the other direction) ÀÇ source ÀÌ´Ù.
±×¸®°í À̼ø°£ÀÇ trap Àº watchdog reset À» ÀÏÀ¸Å°¹Ç·Î CWP ´Â ½ÇÁ¦ ¹Ù²î¾î
Á®¼ invaild window ¸¦ °¡¸®Å°´Â point °¡ µÈ´Ù. ÀÌ·¯ÇÑ ÀÌÀ¯¸¦ À§ÇÏ¿©
H/W ¿Í S/W (trap handling process) ´Â ´ÜÁö local(%l0-%l7) registers À»
»ç¿ëÇÒ¼ö°¡ ÀÖ´Ù. ´Ù¸¥ registers ´Â touch µÇ¾îÁöÁö¾Ê´Â´Ù.
À̰ÍÀº stack »ó¿¡¼ nonstandard stack frame À» ¸¸µé¸ç ¿¹¸¦µé¸é return
address (in %i7) Àº ½ÇÁ¦ valid pointer °¡ ¾Æ´Ô.
Trap Base Register ´Â º¸Åë ½Ã½ºÅÛÀÇ ÃʱâÈ °úÁ¤¿¡¼ Çѹø setup À̵Ǹç
ÀϺΠpage boundary ¸¦ °¡¸®Å²´Ù.
Trap Base Address Trap Type 0000
(20 bits) (8 bits)
- lower bits ´Â Ç×»ó 0 ÀÌ¸ç ´ÙÀ½ 8 bits ´Â trap type field ·Î¼ H/W ¿¡¼
Á¤ÀÇµÈ trap ÀÇ type ¿¡ ±Ù°ÅÇÏ¿© ÀÚµ¿ÀûÀ¸·Î ä¿öÁø´Ù.
7. Trap frames
- trap frame Àº ±¸Á¶ÀûÀ¸·Î stack frame ÀÇ ´Ù¸¥ type °ú ´Ù¸£Áö ¾Ê´Ù.
trap frame Àº local register %l1 ¿¡ ÀÖ´Â trap À» ÀÏÀ¸Å² instructionÀÇ
ÁÖ¼Ò¸¦ °¡Áö¸ç local register %l2 ¿¡ next PC address ¸¦ °¡Áø´Ù.
À̰ÍÀº À§¿¡¼µµ ¸»ÇßÁö¸¸ H/W ¿¡ ÀÇÇØ ÇàÇØÁø´Ù.
trap À» handling ÇÏ´Â S/W ÀÇ ±â´ÉÀº registers ¿Í °°ÀÌ ´Ù¸¥ÀÏÀ» ÇÒÁöµµ
¸ð¸£¸ç ±×·¯³ª º¸Åë, ÃÖ¼ÒÇÑ PC address°¡ %l1 ¿¡ °¡´ÉÇÏ´Ù.
- Synchronous traps resulting from an instruction Àº º¸Åë stack trace
¹Ù·ÎµÚ¿¡ trap fram ÀÌ ³ªÅ¸³ª´Â fault function ¶Ç´Â trap function À¸·ÎºÎÅÍ
ÇϳªÀÇ frame À» °®´Â´Ù.
- º¸Åë external device interrups ¿¡ ÀÇÇØ ¹ß»ýÇÏ´Â Asynchronous faults ´Â
interrupt-handling code ¿¡ ÀÇÇØ Àνĵɼö°¡ ÀÖ´Ù. À̰ÍÀº hardclock ÀÎ
clock function ÀÌ µÉ¼öµµ ÀÖ°í ¶Ç´Â ÇϳªÀÇ Æ¯º°ÇÑ interrupt level(level 10)¿¡ Àü¿ëÀΠƯÁ¤ÇÑ code °¡ µÉ¼öµµ ÀÖ´Ù. interrupt ³ª fault handler °°Àº ÀÌ·±
functions ¿¡ ÂüÁ¶ÇÏ´Â stack »óÀÇ address ·Î return Çϴ°ÍÀº º¸Åë
¹Ù·Î ¾ÕÀÇ trap frame ¸¦ °¡¸®Å²´Ù. code address in %l1 °ú °°Àº frame À»
ÁÖÀDZí°Ô º¸¸é º¸Åë ±× address ´Â in %l2 ´õÇϱâ 4 °¡ µÈ´Ù.
Device interrupts ´Â º¸Åë interrupt service routine ÀÇ À̸§¿¡ ÀÇÇØ ÀνĵǸç
À̰͵éÀº º¸Åë int ·Î ³¡³´Ù. ¿¹¸¦µé¸é zsint() ´Â ZS(serial keyboadr/moust)
device ¸¦ À§ÇÑ service routine ÀÌ´Ù.
8. Trap types
- °¢ trap type Àº unique ÇÑ number ¸¦ °¡Áö¸ç À̰ÍÀº Trap Base Register ¸¦
¼öÁ¤Çϴµ¥ »ç¿ëµÇ¸ç ±×¸®°í CPU ¸¦ Á¤È®ÇÑ trap-handling routine À¸·Î Áö½ÃÇϴµ¥
»ç¿ëµÈ´Ù. SPARC chip Specs ¿¡ ÀÇÇØ ÇÒ´çµÈ types ´Â º¸Åë ±×µéÀÇ Priority ¿¡
´ëÃæ ÀÏÄ¡ÇÑ´Ù. trap priorities ´Â ´ÜÁö µ¿½ÃÀÇ trap ¶Ç´Â interrupt requests°¡
³ªÅ¸³¯¶§¿¡¸¸ Áß¿äÇÏ´Ù. ¸î°³ÀÇ Bad Trap panics ¸¦ º»ÈÄ¿¡´Â ÀÌ·¯ÇÑ °ÍµéÀÌ ´ç½Å
¿¡°Ô´Â Àͼ÷ÇÒ°ÍÀÌ´Ù. (data fault ¿¹¸¦µé¸é, trap tyep 9 )
- °¡Àå ÀϹÝÀûÀÎ trap types °ú ÀǹÌ
1 : Illegal instruction access(text fault)
2 : Illegal instruction
3 : Privileged instruction
4 : Floating-point disabled
5 : Window overflow
6 : window underfolw
7 : Memory address alignment error
8 : Floating-point exception
9 : Data access exception ( data fault)
17: Interrupt level 1
18: Interrupt level 2 up to
31: Interrupt level 15
128: Software trap #0 up to
255: Software trap #127
9. Retunring from traps
- ½Ã½ºÅÛÀº interrupt µÈ code ¶Ç´Â trap ÀÌ ¹ß»ýÇÑ code ·Î µ¹¾Æ°¥¼öÀÖ¾î¾ß¸¸
ÇÑ´Ù. ¿©±â¿¡ rett ¶ó°íÇÏ´Â ÇϳªÀÇ Æ¯º°ÇÑ instruction ÀÎ return from trap
operation À» ¼öÇàÇÏ´Â °ÍÀÌ ÀÖ´Ù. À̰ÍÀº H/W °¡ trap À» ÀνÄÇßÀ»¶§ ¹ß»ýÇÑ
events ÀÇ sequence ¸¦ ¿øÀ§Ä¡ ½ÃŲ´Ù.
10. panic() routine.
- panic() routine Àº °©Àڱ⠸ðµç Á¤»óÀûÀÎ process scheduling À» interrupt ÇÔ.
user ÀÇ °üÁ¡¿¡¼ º»´Ù¸é ½Ã½ºÅÛÀº Á×Àº°ÍÀÌ´Ù. panic() Àº ±× memory ÀÇ ³»¿ëÀ»
dump device ¿¡ ±×´ë·Î copy ÇϰԵȴÙ. default ·Î, dump device ´Â º¸Åë primary
swap device ÀÌ´Ù. dumps ¸¦ À§Çؼ disk ÀÇ ºÐ¸®µÈ chunk ¸¦ »ç¿ëÇϴ°ÍÀ» º¸±â´Â
Èûµé´Ù. ±×·¯³ª ±×·¯ÇÑ ¹æ¹ýÀ¸·Î setup µµ °¡´ÉÇÏ´Ù. ´ëºÎºÐÀÇ UNIX systems ¿¡
ÀÖ¾î¼ dump device ´Â ¹Ýµå½Ã ÇϳªÀÇ disk partition ÀÌ µÇ¾î¾ßÇÑ´Ù. ÀϺνýºÅÛÀº
tape drive °¡ ¸í½ÃµÇ±âµµ ÇÑ´Ù.
- panic() Àº ÇöÀçÀÇ CPU »óÅ¿¡ ´ëÇÑ critical information À» ±â·ÏÇÑ´Ù.
ÀÌ·¯ÇÑ information Àº CPU registers, stack pointer, ±×¸®°í ´Ù¾çÇÑ state
register ¸¦ Æ÷ÇÔÇϰí ÀÖ´Ù.
- Çѹø panic() ÀÌ dumping memory ¸¦ dump device ¿¡ ¿Ï¼ºÇϰԵǸé
½Ã½ºÅÛÀ» reboot ÇÑ´Ù.
11. Panic messages
- system programmer ¿Í ÇöÀçÀÇ operation ¿¡ µû¶ó¼ ÀϺΠpanic messages Àº ²Ï
°£´ÜÇØÁú¼ö°¡ ÀÖ´Ù. ¹Ý¸é¿¡ ´Ù¸¥°ÍµéÀº »ó´çÈ÷ ÀÚ¼¼ÇÏ°Ô messages ¸¦ Á¦°øÇÑ´Ù.
Áï, °¡²û ´ç½ÅÀº calling program ÀÇ name À̳ª »ç¿ëµÇ°í ÀÖ´Â variables »Ó¸¸
¾Æ´Ï¶ó ±× source ÀÇ line number ±îÁö º¸°ÔµÉ¼öµµ ÀÖ°í ´ÜÁö programmer ¸¸ÀÌ
¾Ë¾Æº¼¼öÀÖ´Â ´Ù¼Ò cryptic word µµ º¼¼öÀÖ´Ù.
12. Kernel Tracebacks
- panic ÀÇ ¿øÀÎÀ» Á¤È®È÷ °áÁ¤Çϱâ À§Çؼ´Â source code °¡ ÇÊ¿äÇÏÁö¸¸
stack À» º½À¸·Î½á °¡²û ¹®Á¦ÀÇ º»Áú·Î¼ÀÇ ½Ç¸¶¸®¸¦ Á¦°øÇÏ´Â Èï¹ÌÀÖ´Â
information À» Á¦°ø¹ÞÀ»¼ö°¡ ÀÖ´Ù.
Sun-3 systems Àº function call À» À§ÇÏ¿© parameters ¸¦ stack »ó¿¡
push ÇÏÁö¸¸ Sun-4/SPARC systems Àº registers ¸¦ »ç¿ëÇÑ´Ù.
À̸®ÇÏ¿© Sun-3 stack traceback Àº ´Ù¾çÇÑ parameters ¸¦ º¸¿©ÁÙ°ÍÀÌ´Ù.
±×·¯³ª SPARC stack Àº Ç×»ó Á¤È®È÷ six parameters ¸¸ º¸¿©ÁØ´Ù.
À̰͵éÁßÀÇ ÀϺδ registers ¸¦ scratch(erase) ÇÒ¼öµµ ÀÖÁö¸¸ ´Ù¸¥ÀϺδÂ
À¯È¿ÇÏ´Ù. Áï, ¾ó¸¶³ª ¸¹Àº parameters °¡ pass µÇ¾ú´Â°¡¸¦ ¾Ë±âÀ§ÇØ ±× code
¸¦ check ÇÏÁö¾Ê°í¼´Â ¾Ë ¹æ¹ýÀÌ ¾ø´Ù.
- stack traceback Àº º¸Åë ±× code °¡ Á×¾úÀ»¶§¿¡ call ÇÑ ¸¶Áö¸· routine À»
º¸¿©ÁØ´Ù. Áï, H/W fault ¿¡ ´ëÇØ¼´Â actual location ¿¡¼ÀÇ PC value.
adb ÀÇ ?i ´Â real function À» ³ªÅ¸³»ÁØ´Ù. »ç¿ëÇØº¸¶ó.¶ÇÇÑ,
SPARC system À» À§Çؼ traps Àº erroneous traceback °ú °°ÀÌ º¸ÀÌ´Â
´Ù¸¥ registers ¿¡ PC value ¸¦ ÀúÀåÇϰԵȴÙ, Sun-4 systems ÀÇ ¸¹Àº°æ¿ì
´ç½ÅÀº trap function ÀÇ ¹Ù·Î ¾Õ address ¸¦ ¹«½ÃÇϰԵǴµ¥ ¿Ö³ÄÇϸé
¹Ýµå½Ã À¯È¿ÇÏÁö´Â ¾Ê±â ¶§¹®ÀÌ´Ù.
ºñ·Ï, ½ÇÁ¦·Î parameter °¡ ¹«¾ùÀÎÁö¸¦ °áÁ¤Çϴ°ÍÀÌ ½±Áö´Â ¾ÊÁö¸¸,
ù¹øÂ° ¸î°³ÀÇ registers ¿¡ ÀÖ´Â ¿©·¯°³ÀÇ zeros, small constants, or odd
numbers ´Â chain À¸·Î ³»·Á¿À¸é¼ Àü´ÞµÈ bad parameters ¸¦ ³ªÅ¸³¾¼ö°¡ ÀÖ´Ù.
- Many times device drivers are involved.
Check for these in the traceback.
driver routines Àº ÀϹÝÀûÀ¸·Î 2 or 3-letter abbreviation À¸·Î ½ÃÀ۵Ǹç
À̰ÍÀº ±× function ÀÇ À̸§À¸·Î ¼öÇàµÇ°í boot time ¶§ probe routine ¿¡
ÀÇÇØ device ÀÇ À̸§À¸·Î printed µÈ´Ù.
STREAMS-related ÀÎ str ·Î¼ xystrate,zsopen, stwrite °¡ ÀÖ´Ù.
¶ÇÇÑ interrupt service routines À» ÁÖ¸ñÇ϶ó. ¸¸¾à, xyintr °¡ stack³»¿¡
³ªÅ¸³´Ù¸é, ±×°ÍÀº ÀϹÝÀûÀ¸·Î traceback information °ú °ü·ÃÀÌ ¾ø´Ù,
panic or trap Àº interrupt code ³»¿¡¼ ¹ß»ýÇÏ¸ç ¾Æ¸¶µµ device ¿¡ °ü·ÃÀÌ
ÀÖÀ¸¸ç ÇöÀç process context ¿¡ °ü·ÃÀÌ ¾ø´Ù.
< 4. Watchdog Reset >
1. What is a watchdog ?
- °¡²û ½Ã½ºÅÛÀº "watchdog reset" À̶ó´Â message ¸¦ console ¿¡ ³»°í PROM
À¸·Î ³»·Á°£´Ù. À̰ÍÀº panic Àº ¾Æ´Ï´Ù. ±× ½Ã½ºÅÛÀº ´õÀÌ»ó control ¿¡
Àִ°ÍÀº ¾Æ´Ï´Ù. ±×°ÍÀº memory ¸¦ disk ·Î dumping ÇÏÁö¾Ê°í
CPU °¡ reset À¸·Î µÈ´Ù.
- Watchdog resets Àº ±Ùº»ÀûÀÎ ¿øÀÎÀº H/W ¿¡ ¿¬°üµÉÁöµµ ¸ð¸£Áö¸¸ º¸ÅëÀº
S/W ¹®Á¦ÀÌ´Ù. Á÷Á¢ÀûÀÎ ¿øÀÎÀº page fault ¿Í °°Àº trap Àε¥ ´Ù¸¥ trap À»
handling ÇÏ´ÂÁß¿¡ ¹ß»ýÇÑ´Ù. Kernel Àº PSR(Processor Status Register) ³»ÀÇ
Enable Traps bit À» reset(turned off) ½ÃÅ´À¸·Î½á trap À» ¿î¿ëÇϴµ¥
À̰ÍÀº ÃÖÃÊ¿¡ 󸮵Ǵø ù¹øÂ° trap ÀÌ ³¡³¯¶§±îÁö ´Ù¸¥ trap À» CPU °¡
ó¸®Çϴ°ÍÀ» ¹æÁöÇÑ´Ù. À̰ÍÀº Áï ½Ã½ºÅÛÀÌ Ã¹¹øÂ° trap À» ¿ÏÀüÈ÷ ó¸®ÇÒ¶§
±îÁö ´Ù¸¥ trap Àº ¸¸µé¾îÁöÁö ¾Ê´Â´Ù´Â ÀǹÌÀÌ´Ù. ¸¸¾à¿¡ ÀÌ ±â°£ µ¿¾È ¿¡
¾î¶² ÀÌÀ¯¶§¹®¿¡ ÇϳªÀÇ trap ÀÌ ¹ß»ýÇÑ´Ù¸é ½Ã½ºÅÛÀº trap À» ¼öÇàÇØ¾ß
Çϴµ¥ À̰ÍÀº bit °¡ off µÇ¾î¼°¡ ¾Æ´Ï±â ¶§¹®¿¡ ½Ã½ºÅÛÀº ±× Áï½Ã
quit(ÁßÁö) ÇÑ´Ù. À̰ÍÀÌ ¹Ù·Î watchdog reset ÀÌ´Ù. Áï, unrecoverable
situation ( ±Ùº»ÀûÀ¸·Î CPUÀÇ reset »óÅ·Π°Á¦·Î ¸¸µå´Â °Í) ÀÌ´Ù.
Watchdog reset ÈÄ¿¡ ´ç½ÅÀÌ ÇÒ¼öÀÖ´Â À¯ÀÏÇÑ ÀÏÀº ¹Ù·Î reboot ÀÌ´Ù.
- Watchog reset ÀÇ Æ¯¼º¶§¹®¿¡ kadb Á¶Â÷µµ watchdog ÀÌ ÀϾÀ¸¶§ÀÇ
watchdog resets À» ÀâÀ»¼ö°¡ ¾ø´Ù.±×·¯³ª ´ç½ÅÀº °£´ÜÈ÷ ¸î°³ÀÇ
OpenBoot PROM commands ·Î¼ reboot ÇϱâÀü¿¡ ÀϺ¸ÀÇ status informatin
À» ¾òÀ»¼ö°¡ ÀÖ´Ù.
2. Can you get a core file ?
- Not usually, ÀÌ watchdog ÀÇ ÆÄ±«ÀûÀÎ ¼Ó¼º»ó ´ç½Å ÀÌ boot PROM ok prompt ¸¦
º¸°ÔµÈ´Ù°í ÇÏ´õ¶óµµ CPU registers ´Â ¹ú½á ±úÁ®ÀÖ°í sync command ¼öÇàÀÌ
fail or ¾µµ¥¾ø´Â core dump ¸¦ ¾ò°ÔµÉ°ÍÀÌ´Ù. À̰ÍÀº unreadabl ¶Ç´Â »ìÆìº¼
ÁÁÀº data °¡ ³²¾ÆÀÖÁö ¾Ê´Ù. Ç×»ó try ÇØº¼ÆÞ¿ä´Â ÀÖÁö¸¸ ±×·¯³ª ´ç½ÅÀÌ
¸ÕÀúÇØ¾ßÇÒ ´Ù¸¥ÀÏÀÌ ÀÖ´Ù.
3. What do you do next ?
- Çѹø boot PROM ok prompt ¸¦ °¡Áø´Ù¸é ´ç½ÅÀº ¸î°³ÀÇ Áß¿äÇÑ PROM command
¸¦ »ç¿ëÇÒ¼ö°¡ ÀÖÀ¸¸ç ½Ã½ºÅÛÀÌ watchdog À» ¹Þ¾ÒÀ»¶§ ±× ½Ã½ºÅÛÀÇ »óÅ¿¡
°üÇÑ information À» dump out ÇϱâÀ§ÇØ ´ÙÀ½°ú °°Àº ¸í·ÉÀÌ ÀÖ´Ù.
* .registers : Display many of the kernel internal CPU registers.
* .locals - Dumps out the registers in the current register "window."
These are the registers that were in use at the time of the ctash.
* .psr - prints the Processor Status Register contents in a readable format.
* ctrace - Displays the return stack(like $c in adb)
* wd-dump (sun4d only)
- ºÒÇàÇϰԵµ À̼ø°£¿¡ kernel Àº running ÀÌ µÇÁö¾Ê´Â »óÅÂÀ̹ǷΠ´ç½ÅÀº
ÀÌ information À» file ·Î ¹ÞÀ»¼ö°¡ ¾ø´Ù. ´ç½ÅÀº ¾Æ¸¶µµ paper ¿¡ ±â·Ï.
4. Watchdog analysis.
- Watchdog reset Àº ½Ã½ºÅÛÀÌ traps À» processing ÇÒ¶§¿¡ ¹ß»ýÇϹǷΠactual PC
º¯¼ö´Â Å©°Ô ¼Ò¿ëÀÌ ¾ø´Ù. ´ç½ÅÀº kernel trap handling code ¸¦ ºÐ¼®ÇؾßÇϰí
trace information Àº °¡Àå Áß¿äÇϰí À¯¿ëÇÑ output ÀÌ´Ù. ´ç½ÅÀÌ PROM À» ÀÌ¿ëÇÒ
¶§ kernel Àº running µÇÁö¾ÊÀ¸¸ç sysmbol table Àº PROM code ¿¡ À¯¿ëÇÏÁö¾Ê´Ù.
Áï, PROM command ·Î ºÎÅÍÀÇ output Àº ÀüÀûÀ¸·Î hexdecimal À̸ç raw numeric
address ÀÌ´Ù. ±× system ÀÌ reboot µÇ°í »ì¾ÆÀÖ´Â ½Ã½ºÅÛ»ó¿¡¼ adb ¸¦ °¡Áö°í
kernel ³»ÀÇ functions À¸·Î¼ try ÇØº¼¼ö°¡ ÀÖ´Ù. addredd/i ´Â stack trace ·Î
·Î ºÎÅÍ °¢ address ÀÇ À§Ä¡¿Í instruction À» display ÇÒ¼ö°¡ ÀÖ´Ù.
5. Summary
- Analyzing watchdog reset is not an easy task. ¸î°³ÀÇ PROM command ¸¸ÀÌ »ç¿ë
ÇÒ¼ö°¡ ÀÖ°í ´ç½ÅÀÇ ³ë·Â¿¡ ºñ¾Ö À¯¿ëÇÑ information À» Ç×»ó ¾òÀ»¼ö Àִ°ÍÀº
¾Æ´Ï´Ù. ¸¸¾à ´Ù¼öÀÇ watchdog resets ÀÌ ¹ß»ýÇÑ´Ù¸é ´ç½ÅÀº ÀϰüµÈ results ¸¦
¾òÀ»¼ö°¡ ÀÖÀ»°ÍÀÌ°í °ü·ÃµÈ functions À» ¾Ë°Ô µÉ°ÍÀÌ´Ù.
ºñ·Ï watchdog resets ÀÌ software ÀÇ problem À̶ó°í ÇÒÁö¶ó°í ±×°ÍµéÀº Á¾Á¾
ƯÁ¤ÇÑ H/W ÀÇ ºÎºÐ(CPU,Memory,M/B...) ¿¡ °ü·ÃÀÌ µÉ¼ö°í ÀÖ´Ù. À̰ÍÀº
stack trace ·Î ºÎÅÍ ¿îÀÌÁÁÀ¸¸é ¾Ë¼ö°¡ ÀÖ´Ù. watchdog resets À¸·ÎºÎÅÍ
ÇÇÇØ¸¦ º¸°í ÀÖ´Â ½Ã½ºÅÛÀ» ó¸®ÇÒ¶§¿¡ ÀüüÀûÀÎ system À» º¸µµ·ÏÇØ¾ßÇÑ´Ù.
H/W ¿Í S/W µÑ´Ù¹®Á¦°¡ Àִ°÷À» ¸»ÀÌ´Ù.
Revision History
ÀÛ¼ºÀÏÀÚ : 96.06.13
ÀÛ¼ºÀÚ : À̽ÂÈÆ
¼öÁ¤ÀÏÀÚ :
¼öÁ¤ÀÚ