Non Maskable Interrupt: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
(No, you wouldn't be in trouble. Clarifications; explanation wasn't really correct.)
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(11 intermediate revisions by 8 users not shown)
Line 1:
The NMI ("'''Non -Maskable Interrupt"''' ('''NMI''') is a hardware-driven interrupt much like the PIC interrupts, but the NMI goes either directly to the CPU, or via another controller (e.g., the ISP)---in which case youit can maskbe themmasked.
 
==About==
NMINMIs occur for RAM errors and unrecoverable hardware problems. For newer computers these things may be handled using machine check exceptions and/or SMI. For the newest chipsets (at least for Intel) there's also a pile of TCO stuff ("total cost of ownership") that is tied into it all (with a special "TCO IRQ" and connections to SMI/SMM, etc). Somehow allAll of the TCO stuff is/can be connected to an onboard ethernet controller, and (at least part of it) is intended for remote monitoring of the system. Unfortunately, the chipset documentation I've been reading candoesn't tell mesay how BIOSsBIOSes normally configure the chipset, and the chipsets themselves support several different options in each case. For example, for a RAM error it could be handled by the chipset itself, it could generate an SMI (where the BIOS/SMM handler does "RAM scrubbing" in software), it could generate a "TCO interrupt", etc. If you add it all up it's a huge complex mess (TCO + SMI + SMBus + northbridgeNorthbridge + PCI bus/controller/s + PCI-to-LPC-bridge + god-knows-what) that can be completely different between motherboards (even motherboards with the same chipset).
 
The short version of this story is that there's only really 2 reasons for an NMI. The first reason is a hardware failure. The second reason is a "watchdog timer", which can be used to detect when the kernel itself locks up (and is sometimes also used for more accurate profiling as it allows EIP to be sampled even when IRQs are disabled).
 
If a hardware failure caused an NMI then there's no way to figure out which piece of hardware caused the NMI. In this case I'dyou trymay to do the least possible in an attemptwant to tellinform the user that a hardware failureerror has occurred, butand atthen the endkernel ofshould the day you can't expect any OS to work sanely on faulty hardware and there's nothing software can do to work aroundshutdown/reset the hardware failure anywaymachine.
 
For the watchdog timer, it must be setup by the OS first. This can actually be done even when the chipset itself doesn't have a special watchdog timer for it (e.g. setting the PIT, RTC/CMOS IRQ or a HPET IRQ to "NMI, send to all CPUs" in the I/O APIC). In this case you want the watchdog timer to be fast (i.e. no slow hardware task switching and cache flushing) and you'd also want all CPUs to share the same timer, which means all CPUs would receive the same IRQ at the same time (which brings me back to the busy flag in your TSS).
 
As an alternative, you could also use the local APIC's timer or the performance monitoring counter overflow for a "per CPU" watchdog timer. Unfortunately these things are usually used for other purposes.
 
==Usage==
 
The NMI is "turned on"enabled (set high) by the memory module when a memory parity error occurs.
 
You have to beBe careful about disabling the NMI and the PIC for extended periods of time (mind you, watchdog timers typically use NMIs).
 
On the XT the NMI can be masked by setting bit 7 on I/O port 0xA0. On the AT the NMI can be masked by setting bit 7 on I/O port 0x70. This port is shared with the CMOS RAM index register using bits 0 through 6 of I/O port 0x70. The CMOS RTC expects a read from or write to the data port 0x71 after any write to index port 0x70 or it may go into an undefined state. There may also need to be an [[Inline_Assembly/Examples#I/O_access|I/O delay]] between accessing the index and data registers. The index port 0x70 may be a write-only port and always return 0xFF on read. Hence the bit masking below to preserve bits 0 through 6 of the CMOS index register may not work, nor may it be possible to retrieve the current state of the NMI mask from port 0x70.
<source lang="C">
 
void NMI_enable(void)
<sourcesyntaxhighlight lang="C">
{
void NMI_enable(void) {
outb(0x70, inb(0x70) & 0x7F);
inb(0x71);
}
 
void NMI_disable(void) {
outb(0x70, inb(0x70) | 0x80);
{
outb(0x70, inb(0x70)|0x800x71);
}
</syntaxhighlight>
</source>
 
When an NMI occurs you can check the system control port A and B at ioI/O addresses 0x92 and 0x61 respectively to get an indication of what caused the error:
 
System Control Port A (0x92) layout:
Line 79 ⟶ 81:
 
[[Category:Interrupts]]
 
[[de:Non Maskable Interrupt]]