Non Maskable Interrupt: Difference between revisions

no edit summary
[unchecked revision][unchecked revision]
(Removed PFR tag)
No edit summary
Line 4:
 
==About==
AFAIK for older computers NMI was usedoccur for RAM errors and unrecoverable hardware problems. For newer computers these things may be handled using machine check exceptions and/or SMI. For the newest chipsets (at least for Intel) there's also a pile of TCO stuff ("total cost of ownership") that is tied into it all (with a special "TCO IRQ" and connections to SMI/SMM, etc). Somehow all of the TCO stuff is/can be connected to an onboard ethernet controller, and (at least part of it) is intended for remote monitoring of the system. Unfortunately the chipset documentation I've been reading can't tell me how BIOSs normally configure the chipset, and the chipsets themselves support several different options in each case. For example, for a RAM error it could be handled by the chipset itself, it could generate an SMI (where the BIOS/SMM handler does "RAM scrubbing" in software), it could generate a "TCO interrupt", etc. If you add it all up it's a huge complex mess (TCO + SMI + SMBus + northbridge + PCI bus/controller/s + PCI-to-LPC-bridge + god-knows-what) that can be completely different between motherboards (even motherboards with the same chipset).
 
''As posted by Brendan in [[Topic:11633|this]] thread''
 
The following is based on my own research into chipsets (and some unfortunate guess-work to fill in the gaps), and shouldn't be considered "100% correct"...
 
AFAIK for older computers NMI was used for RAM errors and unrecoverable hardware problems. For newer computers these things may be handled using machine check exceptions and/or SMI. For the newest chipsets (at least for Intel) there's also a pile of TCO stuff ("total cost of ownership") that is tied into it all (with a special "TCO IRQ" and connections to SMI/SMM, etc). Somehow all of the TCO stuff is/can be connected to an onboard ethernet controller, and (at least part of it) is intended for remote monitoring of the system. Unfortunately the chipset documentation I've been reading can't tell me how BIOSs normally configure the chipset, and the chipsets themselves support several different options in each case. For example, for a RAM error it could be handled by the chipset itself, it could generate an SMI (where the BIOS/SMM handler does "RAM scrubbing" in software), it could generate a "TCO interrupt", etc. If you add it all up it's a huge complex mess (TCO + SMI + SMBus + northbridge + PCI bus/controller/s + PCI-to-LPC-bridge + god-knows-what) that can be completely different between motherboards (even motherboards with the same chipset).
 
The short version of this story is that there's only really 2 reasons for an NMI. The first reason is a hardware failure. The second reason is a "watchdog timer", which can be used to detect when the kernel itself locks up (and is sometimes also used for more accurate profiling as it allows EIP to be sampled even when IRQs are disabled).
Line 36 ⟶ 31:
outb(0x70, inb(0x70)|0x80);
}
 
When an NMI occurs you can check the system control port A and B at io addresses 0x92 and 0x61 respectively to get an indication of what caused the error:
System Control Port A (0x92) layout:
BIT Description
0 Alternate hot reset
1 Alternate gate A20
2 Reserved
3 Security Lock
4* Watchdog timer status
5 Reserved
6 HDD 2 drive activity
7 HDD 1 drive activity
 
 
System Control Port B (0x61)
Bit Description
0 Timer 2 tied to speaker
1 Speaker data enable
2 Parity check enable
3 Channel check enable
4 Refresh request
5 Timer 2 output
6* Channel check
7* Parity check
 
 
[[Category:Interrupts]]
Anonymous user