ISA DMA: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
m (many little fixes and replaced SHOUTING by more professional ''emphasis'')
Line 1: Line 1:
ISA DMA (Direct Memory Access) in today's PC architecture, like ISA itself, is in many ways like an appendix. It is used by the floppy disk controller, ISA sound cards, ISA network cards and parallel ports (if they support ECP mode). Whilst interrupt, keyboard and timer interface circuits have obvious and relevant uses, the ISA DMA controller and it's programming interface are still well and truly stuck in the 1970's where they were first designed. Modern PCI controllers typically have their own "bus mastering", which is far better.
'''ISA DMA''' ('''Industry Standard Architecture Direct Memory Access''') in today's PC architecture, like ISA itself, is in many ways like an appendix. It is used by the floppy disk controller, ISA sound cards, ISA network cards and parallel ports (if they support ECP mode). Whilst interrupt, keyboard and timer interface circuits have obvious and relevant uses, the ISA DMA controller and it's programming interface are still well and truly stuck in the 1970's where they were first designed. Modern PCI controllers typically have their own 'bus mastering', which is far better.


The idea behind DMA is that you can set up a 'channel' with an address pointing into memory and the length of the data to be transferred. Once set up, you can tell the peripheral owning the 'channel' to do whatever it is supposed to do (e.g. read a sector). Then the CPU can go onto something else. Periodically when the data bus isn't being used by the CPU the DMA chip takes over and transfers data between the peripheral and memory without involving the CPU. When the transfer is complete (e.g. an entire sector has been sent to the floppy drive) the DMA chip will than signal that it is finished. The DMA chip can even signal if it has run out of data, allowing the CPU to program some new information and another transaction to proceed.
The idea behind DMA is that you can set up a 'channel' with an address pointing into memory and the length of the data to be transferred. Once set up, you can tell the peripheral owning the 'channel' to do whatever it is supposed to do (e.g. read a sector). Then the CPU can go onto something else. Periodically when the data bus isn't being used by the CPU the DMA chip takes over and transfers data between the peripheral and memory without involving the CPU. When the transfer is complete (e.g. an entire sector has been sent to the floppy drive) the DMA chip will than signal that it is finished. The DMA chip can even signal if it has run out of data, allowing the CPU to program some new information and another transaction to proceed.
Line 6: Line 6:
Of course all good ideas can have downsides and while Intel can't really be blamed for what is about to be described, IBM certainly can.
Of course all good ideas can have downsides and while Intel can't really be blamed for what is about to be described, IBM certainly can.


== DMA Genesis, Chapter 1, Verse 1 ==

==DMA Genesis, Chapter 1, Verse 1==
In the beginning there was a PC, but the PC was slow. IBM looked down from the heavens and said "Slap on a DMA controller - that should speed it up." IBM's heart was in the right place, its collective brains were elsewhere as the DMA controller never met the needs of the system. The PC/AT standard contains 2 Intel 8237A DMA chips, connected as Master/Slave. The second chip is Master, and its first line (Channel 4) is used by the first chip, which is Slave. (This is unlike the interrupt controller, where the first chip is Master.) The 8237A was designed for the old 8080 8-bit processor and this is probably the main reason for so many DMA problems. 8088 and 8086 processors chosen by IBM for its PC were too advanced for the DMA controller.
In the beginning there was a PC, but the PC was slow. IBM looked down from the heavens and said "Slap on a DMA controller - that should speed it up." IBM's heart was in the right place, its collective brains were elsewhere as the DMA controller never met the needs of the system. The PC/AT standard contains 2 Intel 8237A DMA chips, connected as Master/Slave. The second chip is Master, and its first line (Channel 4) is used by the first chip, which is Slave. (This is unlike the interrupt controller, where the first chip is Master.) The 8237A was designed for the old 8080 8-bit processor and this is probably the main reason for so many DMA problems. 8088 and 8086 processors chosen by IBM for its PC were too advanced for the DMA controller.


Line 14: Line 13:
DMA Channel 0 is unavailable as it was used for memory refresh, and remains reserved because of this (even though modern computers don't use it). DMA channel 4 cannot be used for peripherals because it is used for cascading the other DMA controller.
DMA Channel 0 is unavailable as it was used for memory refresh, and remains reserved because of this (even though modern computers don't use it). DMA channel 4 cannot be used for peripherals because it is used for cascading the other DMA controller.


DMA Channels 1, 2 and 3 can only transfer 64 KB of information at a time (the internal registers are only 16 bit). In order to extend this, IBM added an EXTERNAL page register allowing access to 16 MB of memory (24 bits total). If a DMA transfer crosses a 64 KB boundary, the internal count register wraps around to zero, the external page register is NOT incremented. The DMA controller will happily provide the peripheral attached with whatever it finds at the new address. As a rule 8 bit DMA transfers must not cross 64 KB boundaries and must never be bigger than 64 KB.
DMA Channels 1, 2 and 3 can only transfer 64 KB of information at a time (the internal registers are only 16 bit). In order to extend this, IBM added an ''external'' page register allowing access to 16 MB of memory (24 bits total). If a DMA transfer crosses a 64 KB boundary, the internal count register wraps around to zero, the external page register is ''not'' incremented. The DMA controller will happily provide the peripheral attached with whatever it finds at the new address. As a rule 8 bit DMA transfers must not cross 64 KB boundaries and must never be bigger than 64 KB.


DMA Channels 5, 6 and 7 transfer data 16 bits at a time, by shifting addresses one bit to the right and clearing the lower bit. The addressing is exactly the same except you can now start on 128 KB boundaries and transfer up to 128 KB of information at a time before the DMA boldly transfers what has never been tranferred before....
DMA Channels 5, 6 and 7 transfer data 16 bits at a time, by shifting addresses one bit to the right and clearing the lower bit. The addressing is exactly the same except you can now start on 128 KB boundaries and transfer up to 128 KB of information at a time before the DMA boldly transfers what has never been transferred before...


Previously it was mentioned that the DMA controller is able to signal completion and even ask for more information. Unfortunately this would make expansion slots too big, so IBM left all of the connections to the DMA chips off. The only time you know when a transfer is complete is for a peripheral to signal an interrupt. This implies that all peripherals using a DMA channel are limited to no more than 64/128 KB transfers for fear of upsetting the DMA controller.
Previously it was mentioned that the DMA controller is able to signal completion and even ask for more information. Unfortunately this would make expansion slots too big, so IBM left all of the connections to the DMA chips off. The only time you know when a transfer is complete is for a peripheral to signal an interrupt. This implies that all peripherals using a DMA channel are limited to no more than 64/128 KB transfers for fear of upsetting the DMA controller.


Finally, ***ALL*** DMA controllers run at 4Mhz. (Information taken from ISA Specification) ***NO EXCEPTIONS***.
Finally, '''all''' DMA controllers run at 4 MHz. (Information taken from ISA Specification) '''No exceptions'''.
Ignore that Ghz tag on your processor or the fact that even PCI runs at 33MHZ. DMA controllers are fixed at this rate. This includes EISA and PS/2 32-bit DMA controllers. The only differences between these and the stock ISA DMA controllers is a extra page register allowing for 64k transfers anywhere in 4GBytes of space and the ability to do 32-bit transfers, and these DMA controllers exists only on EISA and MCA systems, which are now obsolete and are not described here.
Ignore that GHz tag on your processor or the fact that even PCI runs at 33 MHz. DMA controllers are fixed at this rate. This includes EISA and PS/2 32-bit DMA controllers. The only differences between these and the stock ISA DMA controllers is a extra page register allowing for 64k transfers anywhere in 4 GiB of space and the ability to do 32-bit transfers, and these DMA controllers exists only on EISA and MCA systems, which are now obsolete and are not described here.


So after reading all the above the main point are
So after reading all the above, the main points are:
*DMA channel 1, 2 and 3 are for 8 bit transfers
* DMA channel 1, 2 and 3 are for 8 bit transfers;
*DMA channel 5, 6 and 7 are 16 bit transfers
* DMA channel 5, 6 and 7 are 16 bit transfers;
*Transfers shouldn't cross 64 KB or 128 KB boundaries
* Transfers shouldn't cross 64 KB or 128 KB boundaries;
*Transfers can only be from the lowest 16 MB of memory
* Transfers can only be from the lowest 16 MB of memory;
*DMA is slow - theoretically 4 MB/second but more like 500 KB/second due to ISA bus protocols
* DMA is slow - theoretically 4 MB/second but more like 500 KB/second due to ISA bus protocols.


Note that:
Sound Blaster and Sound Blaster PRO only support 8 bit DMA
Sound Blaster 16+ supports both
* Sound Blaster and Sound Blaster PRO only support 8 bit DMA;
* Sound Blaster 16+ supports both;
Floppy disk controllers only support 8 bit DMA and usually use DMA Channel 2
* Floppy disk controllers only support 8 bit DMA and usually use DMA Channel 2.


==Busmaster DMA==
== Busmaster DMA ==


The IBM/XT used DMA channel 3 for hard disk connections but instead of using DMA the IBM/AT used direct writing to the hard disk using the processor, this was because of the 128 KB limitations outlined above and the fact that the 286 processor could perform 16 bit transactions at 6 MHz. Even the ISA bus could run at a speed of up to 12 MHz, far faster than the 4.77 MHz the DMA controller was running at.
The IBM/XT used DMA channel 3 for hard disk connections but instead of using DMA the IBM/AT used direct writing to the hard disk using the processor, this was because of the 128 KB limitations outlined above and the fact that the 286 processor could perform 16 bit transactions at 6 MHz. Even the ISA bus could run at a speed of up to 12 MHz, far faster than the 4.77 MHz the DMA controller was running at.


Expansion card designers were also upset with DMA's lack of capabilities, noticeably "Hard-Card" hard disk expansion card manufacturers who depended on the speed of data transfer.
Expansion card designers were also upset with DMA's lack of capabilities, noticeably 'Hard-Card' hard disk expansion card manufacturers who depended on the speed of data transfer.


To get around the limitations of the "onboard" DMA controller, expansion card manufacturers began to put their own DMA controllers on their expansion cards. The controllers didn't have any of the design 'features' of the 8237A and could address all 16 MBytes of ISA addressable memory with the full 16 bit data bus width. They functioned exactly the same way as the "onboard" DMA, 'stealing bus cycles' when the processor wasn't looking and thus improving the performance of the system as a whole.
To get around the limitations of the 'on board' DMA controller, expansion card manufacturers began to put their own DMA controllers on their expansion cards. The controllers didn't have any of the design 'features' of the 8237A and could address all 16 MiB of ISA addressable memory with the full 16 bit data bus width. They functioned exactly the same way as the 'on board' DMA, 'stealing bus cycles' when the processor wasn't looking and thus improving the performance of the system as a whole.


Busmaster DMA is probably easier to understand than normal DMA. Provided you know what registers are on your expansion card, you can send them the address and the amount of data to transfer. The expansion card will send you an interrupt upon completion.
Busmaster DMA is probably easier to understand than normal DMA. Provided you know what registers are on your expansion card, you can send them the address and the amount of data to transfer. The expansion card will send you an interrupt upon completion.
PCI Bus Masters can of course access memory with 32 bit addressing. Newer PCI cards are starting to support 64 bit addressing (although at the moment most don't). Typically PCI cards use "scatter-gather" bus mastering, where one page is used as a directory of data pages. This is makes them especially suitable for paging OSs (where contiguous pages are rare).
PCI Bus Masters can of course access memory with 32 bit addressing. Newer PCI cards are starting to support 64 bit addressing (although at the moment most don't). Typically PCI cards use 'scatter-gather' bus mastering, where one page is used as a directory of data pages. This is makes them especially suitable for paging operating systems (where contiguous pages are rare).


ISA Bus Masters are still limited to the lower 16Mbytes of memory - this is an ISA limitation.
ISA Bus Masters are still limited to the lower 16 MiB of memory - this is an ISA limitation.


== Programming the PC DMA controllers ==


=== Paging ===
==Programming the PC DMA controllers==


As this article assumes you are writing your own OS, the subject of memory management is very important unless you are writing a 286 based operating system. This is because as mentioned earlier, all DMA transactions must occur in the lower 16 MiB of memory. (Yes, I'm repeating myself - memorize - it is crucial.)
===Paging===


To be more specific all DMA transactions must occur in ''physical memory''. The DMA controller only functions with ''physical'' memory, if you pass it a virtual address happily mapped into anywhere in memory, the DMA controller will happily ignore your carefully constructed page mapping and access whatever it decides to. VM86 mode does not use ''physical'' addressing. The physical addresses are ''fake''. In VM86 mode the OS must emulate any DMA transactions on behalf of an application, probably in a totally non-DMA way. Almost all DOS memory managers and operating systems that uses V86 mode to run DOS applications (including Windows 3.x enhanced mode and EMM386) supply virtual DMA functions, but these are basically the same, the V86 monitor 'fakes' the DMA transaction. DMA has no knowledge of paging whatsoever and I should point out that this includes Bus Master DMA.
As this article assumes you are writing your own OS, the subject of memory management is very important unless you are writing a 286 based operating system. This is because as mentioned earlier, all DMA transactions must occur in the lower 16Mbytes of memory (Yes, I'm repeating myself - memorize - it is crucial)

To be more specific all DMA transactions must occur in PHYSICAL MEMORY. The DMA controller only functions with PHYSICAL memory, if you pass it a virtual address happily mapped into anywhere in memory, the DMA controller will happily ignore your carefully constructed page mapping and access whatever it decides to. VM86 mode does not use PHYSICAL addressing. The physical addresses are FAKE. In VM86 mode the OS must emulate any DMA transactions on behalf of an application, probably in a totally non-DMA way. Almost all DOS memory managers and OSes that uses V86 mode to run DOS apps (including Windows 3.x enhanced mode and EMM386) supplies virtual DMA functions, but these are basically the same, the V86 monitor "fakes" the DMA transaction. DMA has no knowledge of paging whatsoever and I should point out that this includes Bus Master DMA.
Finally (and this is a great laugh when it happens) don't forget to "lock" the pages in memory. Imagine the floppy drive trying to read data from memory and you swap that data out to the hard drive. The DMA chip doesn't know -
Finally (and this is a great laugh when it happens) don't forget to 'pin' the pages in memory. Imagine the floppy drive trying to read data from memory and you swap that data out to the hard drive. The DMA chip doesn't know -
"Paging? Whats that then? Dunno. You've just mapped in the password file? No problem, I'll write that out to the floppy instead."
"Paging? Whats that then? Dunno. You've just mapped in the password file? No problem, I'll write that out to the floppy instead."
DMA memory is like Memory Mapped I/O - do not move - do not page.
DMA memory is like Memory Mapped I/O - do not move - do not page.
(I apologise for sounding really dictatorial, it's just that DMA is so easy to screw up. Lots' more things to screw up later! - just wait till we get to commands)
(I apologise for sounding really dictatorial, it's just that DMA is so easy to screw up. Lots' more things to screw up later! - just wait till we get to commands)


One way to handle this situation is to manage memory below 16 MB using a "bitmap" so that contiguous pages that don't cross 64 KB or 128 KB boundaries can be found, and use free page stacks (or anything else) for memory above 16 MB.
One way to handle this situation is to manage memory below 16 MB using a 'bitmap' so that contiguous pages that don't cross 64 KB or 128 KB boundaries can be found, and use free page stacks (or anything else) for memory above 16 MB.


<pre>
An example for getting it right:
An example for getting it right:
(Only Psuedo Code for this, it is up to you to design your memory management)
* When your OS starts it detects (or is told of) a device requiring DMA memory.
* Map the lowest 64K of ''physical'' memory elsewhere in memory and aim the DMA channel you are setting up at address 0.
* Set up the DMA channel for the correct type of transaction (more on this later).


* Next DMA Channel, map the next 64K of ''physical'' memory somewhere, aim your DMA channel at address 64K.
When your OS starts it detects (or is told of) a device requiring DMA memory.
* Set up the DMA channel for the correct type of transaction (more on this later).
Map the lowest 64K of PHYSICAL memory elsewhere in memory and aim the DMA channel you are
setting up at address 0.
Set up the DMA channel for the correct type of transaction (more on this later).


* Complete building your pages stacks / bitmaps whatever and add any lower memory left over.
Next DMA Channel, map the next 64K of PHYSICAL memory somewhere, aim your DMA channel at
address 64K.
Set up the DMA channel for the correct type of transaction (more on this later).

Complete building your pages stacks / bitmaps whatever and add any lower memory left over.
</pre>


Apart from making sure that you have a copy somewhere of the information in the first 4K of memory and the Extended BIOS Data Area (these are for VM86 tasks if you decide to add BIOS/DOS/VESA/PCI support later), the above example will allow you to have a flat paged memory area from 0 to forever while still allowing you OS to use old DMA dependent peripherals. The device drivers work with CPU paged memory areas, while the DMA controller is happily working with low memory UNDER 16MBYTES.
Apart from making sure that you have a copy somewhere of the information in the first 4K of memory and the Extended BIOS Data Area (these are for VM86 tasks if you decide to add BIOS/DOS/VESA/PCI support later), the above example will allow you to have a flat paged memory area from 0 to forever while still allowing you OS to use old DMA dependent peripherals. The device drivers work with CPU paged memory areas, while the DMA controller is happily working with low memory ''under 16 MiB''. Have the driver write data to the paging relocated area, send commands to the peripheral and the DMA chip will access ''physical'' memory correctly when asked to do so.
Have the driver write data to the paging relocated area, send commands to the peripheral and the DMA chip will access PHYSICAL memory correctly when asked to do so.


===Registers===
===Registers===
Each 8237A has 5 registers addressed via I/O space
Each 8237A has 5 registers, addressed via I/O space:


{| {{wikitable}}
{| {{wikitable}}
Line 137: Line 131:
|}
|}


Each DMA Channel has an External Page Address Register that is added to the upper 8 bits of the DMA transfer address
Each DMA Channel has an External Page Address Register that is added to the upper 8 bits of the DMA transfer address:


{| {{wikitable}}
{| {{wikitable}}
Line 191: Line 185:
|}
|}


* '''REQ3-0''': When set: DMA Request Pending.
* '''TC3-0''': When set: Transfer Complete.


REQ3-0 1 = DMA Request Pending
TC3-0 1 = Transfer Complete
This register isn't very important in light of the fact that the 8237 can't send an IRQ to tell you that it has finished. Usually there is no need to poll this register as again the peripheral owner will send an interrupt when a transaction has finished and polling would be a total waste of time.
This register isn't very important in light of the fact that the 8237 can't send an IRQ to tell you that it has finished. Usually there is no need to poll this register as again the peripheral owner will send an interrupt when a transaction has finished and polling would be a total waste of time.




;Command Registers 0x08 and 0xD0 (Write)
;Command Registers 0x08 and 0xD0 (Write)
{| {{wikitable}}
{| {{wikitable}}
|-
|-
Line 220: Line 214:


This register really shows how incompatible the 8237 is with the PC hardware.
This register really shows how incompatible the 8237 is with the PC hardware.
Let's start with EXTW and COMP. These increase the speed of DMA transfer by 25% be removing one of the clock cycles. Does it work? No.
* Let's start with EXTW and COMP. These increase the speed of DMA transfer by 25% be removing one of the clock cycles. Does it work? No.
PRIO. When zeroed, this allows DMA priorities to be rotated allowing freedom and liberty for all peripherals that share the data bus. Does it work? No.
* PRIO. When zeroed, this allows DMA priorities to be rotated allowing freedom and liberty for all peripherals that share the data bus. Does it work? No.
MMT and ADHE. Did you know that the IBM PC could do memory to memory transfers since 1981? Thats right, hardware sprites, hardware frame buffering from one location to another. Does it work? No.
* MMT and ADHE. Did you know that the IBM PC could do memory to memory transfers since 1981? Thats right, hardware sprites, hardware frame buffering from one location to another. Does it work? No.
COND. Hooray the only bit in the control register that does something useful. Setting this bit disables the DMA controller. This lets you set up multiple DMA channels without masking each and every channel.
* COND. Hooray the only bit in the control register that does something useful. Setting this bit disables the DMA controller. This lets you set up multiple DMA channels without masking each and every channel.


;Request Registers 0x09 and 0xD2 (Write)
;Request Registers 0x09 and 0xD2 (Write)
:Used for memory to memory transfers and setting up priority rotation- absolutely useless.
Used for memory to memory transfers and setting up priority rotation- absolutely useless.


;DMA Channel Mask Registers 0x0A and 0xD4 (Write)
;DMA Channel Mask Registers 0x0A and 0xD4 (Write)


{| {{wikitable}}
{| {{wikitable}}
Line 251: Line 245:
|}
|}


To mask a channel for programming / whatever set STCL to 1 and SEL0 and SEL1 to select the channel to be masked
To mask a channel for programming/whatever set STCL to 1 and SEL0 and SEL1 to select the channel to be masked.
sending 0x4(100b) would mask DMA Channel 0 of the appropriate 8237.
* Sending 0x4(100b) would mask DMA Channel 0 of the appropriate 8237.
sending 0x0(000b) would unmask DMA Channel 0 of the appropriate 8237.
* Sending 0x0(000b) would unmask DMA Channel 0 of the appropriate 8237.
To unmask a channel set STCL to 0
* To unmask a channel set STCL to 0.
This register only allows you to mask / unmask a '''SINGLE''' DMA channel if you need to mask multiple DMA channels you need to use -
This register only allows you to mask/unmask a '''single''' DMA channel. If you need to mask multiple DMA channels, you need to use the following:


;DMA Mask Registers 0x0F and 0xDE (Write)
;DMA Mask Registers 0x0F and 0xDE (Write)
Line 279: Line 273:
|}
|}


Setting the appropriate bits to 0 or 1 allows you to mask or unmask those channels.
Setting the appropriate bits to 0 or 1 allows you to mask or unmask those channels. DMA channels 3, 2, 1, 0 or 7, 6, 5, 4. Note that masking DMA channel 4 will mask 7, 6, 5 and 4 due to cascading.
DMA channels 3, 2, 1, 0 or 7, 6, 5, 4. Note that masking DMA channel 4 will mask 7, 6, 5 and 4
due to cascading.


;DMA Mode Registers 0x0B and 0xD6 (Write)
;DMA Mode Registers 0x0B and 0xD6 (Write)
{| {{wikitable}}
{| {{wikitable}}
|-
|-
Line 307: Line 299:
This register is tricky as it depends highly on the peripheral you are programming the DMA controller for.
This register is tricky as it depends highly on the peripheral you are programming the DMA controller for.


*SEL0 and SEL1 selects the channel you want to change.
* '''SEL0''' and '''SEL1''' select the channel you want to change;
*TRA0 and TRA1 selects the transfer type.
* '''TRA0''' and '''TRA1''' selects the transfer type;
**00 runs a self test of the controller.
** 0b00 runs a self test of the controller;
**01 DMA Channel is for writing to memory
** 0b01 DMA Channel is for writing to memory;
**10 DMA Channel is for reading from memory
** 0b10 DMA Channel is for reading from memory;
**11 invalid
** 0b11 invalid.
* '''AUTO''': When this bit is set, after a transfer has completed the channel resets itself to the values you programmed into it. This is great for floppy transfers. Read in a track - the values set themselves up for reading again immediately. For writing you'd only need to alter the transfer mode - not the addresses. Some expansion cards do not support auto-init DMA such as Sound Blaster 1.x. These devices will crash if used with auto-init DMA. Sound Blaster 2.0 and later do support auto-init DMA.
* '''IDEC''': Increment/Decrement. I guess if you wanted to write big endian data to a floppy drive you could. Big-endian processor based computers wouldn't be using an 8237.
* '''MOD0''' and '''MOD1''': This is where some problems can arise based on the peripheral the DMA controller is attached to. The DMA controller has several modes:
** 0b00 = Transfer on Demand;
** 0b01 = Single DMA Transfer;
** 0b10 = Block DMA Transfer;
** 0b11 = Cascade Mode (use to cascade another DMA controller).


Single transfer mode is good for peripherals than cannot cache a lot of data at once. A good example is early floppy controllers that have only a very small buffer. The DMA controller and CPU can share the data bus happily. Sound Blaster and Sound Blaster Pro use software or Single Transfer DMA Mode.
AUTO. When this bit is set, after a transfer has completed the channel resets itself to the values you programmed into it.
This is great for floppy transfers. Read in a track - the values set themselves up for reading again immediately. For writing you'd only need to alter the transfer mode - not the addresses. Some expansion cards do not support auto-init DMA such as Sound Blaster 1.x. These devices will crash if used with auto-init DMA. Sound Blaster 2.0 and later do support auto-init DMA.

IDEC. Increment/Decrement. I guess if you wanted to write big endian data to a floppy drive you could. Big-endian processor based computers wouldn't be using an 8237.

MOD0 and MOD1. This is where some problems can arise based on the peripheral the DMA controller is attached to.

<pre>
The DMA controller has several modes:
00 = Transfer on Demand
01 = Single DMA Transfer
10 = Block DMA Transfer
11 = Cascade Mode (use to cascade another DMA controller)
</pre>

Single transfer mode is good for peripherals than cannot cache a lot of data at once. A good example is early floppy controllers that have only a very small buffer. The DMA controller and CPU can share the data bus happily.
Sound Blaster and Sound Blaster Pro use software or Single Transfer DMA Mode.


Block transfer mode is good for peripherals that can buffer entire blocks of information. A good example of this is an ESDI or SCSI controller without Bus Master DMA.
Block transfer mode is good for peripherals that can buffer entire blocks of information. A good example of this is an ESDI or SCSI controller without Bus Master DMA.


Demand transfer mode is good for peripherals that start and stop intermittently such as a tape drive. The drive can read a whole load of information for as long as it can and the suspend the transfer to move to another section of the tape. Newer floppy controllers also work well with demand transfer becasue they have FIFO buffers to store information being read and written. The peripheral controls the flow, and as the information flow is uninterrupted, performance can be gained. CPU's in these later computers generally have caches and can continue working uninterrupted during a demand DMA transfer. Older computers will slow down as their CPU's wait for the data bus to become available.
Demand transfer mode is good for peripherals that start and stop intermittently such as a tape drive. The drive can read a whole load of information for as long as it can and the suspend the transfer to move to another section of the tape. Newer floppy controllers also work well with demand transfer because they have FIFO buffers to store information being read and written. The peripheral controls the flow, and as the information flow is uninterrupted, performance can be gained. CPU's in these later computers generally have caches and can continue working uninterrupted during a demand DMA transfer. Older computers will slow down as their CPU's wait for the data bus to become available.



In addition to the above registers the 8237 has three command registers.
In addition to the above registers the 8237 has three command registers:


The first command register is 0xd8 which is called the 'reset flip-flop' register. The DMA controller is an 8-bit chip. In order to write 16-bit values into an address register or count register you must write two 8 bit values appropriate register address (to follow). Imagine however that you only write one 8 bit value and the the next time try to write a 16 bit value. The DMA chip will be expecting the high byte, not the low byte. Clearing the flip-flop guarantees getting the address into the DMA controller in the right order. To reset the flip-flop write anything to the register :
The first command register is 0xd8 which is called the 'reset flip-flop' register.
The DMA controller is an 8-bit chip. In order to write 16-bit values into an address register or count register you must write two 8 bit values appropriate register address (to follow).
Imagine however that you only write one 8 bit value and the the next time try to write a 16 bit value. The DMA chip will be expecting the high byte, not the low byte. Clearing the flip-flop guarantees getting the address into the DMA controller in the right order.
To reset the flip-flop write anything to the register :-


;ASM
;ASM
Line 351: Line 330:
void reset_flipflop_DMA()
void reset_flipflop_DMA()
{
{
outb(0xdb, 0xFF);
outb(0xdb, 0xFF);
}
}
</pre>
</pre>


Another dictate - The reset flip-flop command must be sent BEFORE ANY 16 bit transaction. The flip-flop DOES NOT RESET after the DMA controller has received the second byte. Easily missed, can cause many problems.
Another dictate - the reset flip-flop command must be sent ''before any'' 16 bit transaction. The flip-flop ''does not reset'' after the DMA controller has received the second byte. Easily missed, can cause many problems.


The second command register is the 'master reset and clear'.
The second command register is the 'master reset and clear'. Writing anything to this register (0x0d) resets all count, address and mask values ready for you to program.
Writing anything to this register (0x0d) resets all count, address and mask values ready for you to program.


;ASM
;ASM
Line 368: Line 346:
void hard_reset_DMA()
void hard_reset_DMA()
{
{
outb(0x0d, 0xFF);
outb(0x0d, 0xFF);
}
}
</pre>
</pre>


The third command register is the 'clear mask register'.
The third command register is the 'clear mask register'. Writing anything to this register (0xDC) will clear the mask register (wow!) releasing all DMA channels to accept DMA requests.
Writing anything to this register (0xdc) will clear the mask register (wow!) releasing all DMA channels to accept DMA requests.


;ASM
;ASM
Line 383: Line 360:
void unmask_all_DMA()
void unmask_all_DMA()
{
{
outb(0xdc, 0xFF);
outb(0xdc, 0xFF);
}
}
</pre>
</pre>


==Address and Count Registers==
== Address and Count Registers ==
Each DMA chip has 4 16-bit address registers, 4 16-bit count registers and 4 ''external'' 8-bit page registers.
As previously mentioned a DMA controller ''cannot'' address or transfer more than 65536 bytes or words of information at a time.
The page registers can allow access to the lower 16 MiB of memory but are ''not'' incremented when crossing a 64k boundary. The easiest way is to make sure the address you supply in the address register is ''zero'' and that any addressing that you program the DMA controller to do is done using the external page register. It seems stupid, but unless you know that your transfer will not be over 64k it is the safest.


Each DMA chip has 4 16-bit address registers, 4 16-bit count registers and 4 *external* 8-bit page registers.
As previously mentioned a DMA controller CANNOT address or transfer more than 65536 bytes or words of information at a time.
The page registers can allow access to the lower 16 MBytes of memory but are NOT incremented when crossing a 64k boundary. The easiest way is to make sure the address you supply in the address register is ZERO and that any addressing that you program the DMA controller to do is done using the external page register. It seems stupid, but unless you know that your transfer will not be over 64k it is the safest.
Example page register settings:
Example page register settings:
{| {{wikitable}}
! Page register
! Physical address
!
|-
| 0
| 0
|
|-
| 1
| 0x10000
| (64k)
|-
| 2
| 0x20000
| (128k)
|-
| .
| .
|
|-
| .
| .
|
|-
| .
| .
|
|-
| etc...
| etc...
|
|}


Make sure that you map the ''physical address'' of your DMA buffer somewhere in memory if using paging. This buffer can ''then'' be referred to by it's virtual memory address. Mark the memory as non pageable. Even though it's not memory mapped I/O, it is, except that isn't. Think of it as borrowing memory from the system because the expansion card manufacturer was too stingy to supply adequate buffering memory on their card.
<pre>
Page register Address (Physical - '''NO VIRTUAL ADDRESSES ALLOWED BY DMA''')
0 0
1 0x10000 (64k)
2 0x20000 (128k)
. .
. .
. .
. .
etc. etc.
</pre>


OK, the I/O port addresses:
Make sure that you map the PHYSICAL ADDRESS of your DMA buffer somewhere in memory if using paging.
This buffer can THEN be referred to by it's virtual memory address. Mark the memory as non pagable.
Even though it's not memory mapped I/O, it is, except that isn't. Think of it as borrowing memory from the system because the expansion card manufacturer was too stingy to supply adequate buffering memory on their card.


{| {{wikitable}}
OK the I/O port addresses -
! DMA 1
! DMA 2
! Register name
|-
| 0x00
| 0xC0
| Address Register channel 0/4
|-
| 0x01
| 0xC1
| Count Register channel 0/4
|-
| 0x02
| 0xC2
| Address Register channel 1/5
|-
| 0x03
| 0xC3
| Count Register channel 1/5
|-
| 0x04
| 0xC4
| Address Register channel 2/6
|-
| 0x05
| 0xC5
| Count Register channel 2/6
|-
| 0x06
| 0xC6
| Address Register channel 3/7
|-
| 0x07
| 0xC7
| Count Register channel 3/7
|}


; Page Registers
<pre>
<div>
DMA 1 DMA 2 Register name
<div style="float: left; margin: 5px">
0x00 0xc0 Address Register channel 0/4
{| {{wikitable}}
0x01 0xc1 Count Register channel 0/4
| 0x87
0x02 0xc2 Address Register channel 1/5
| channel 0
0x03 0xc3 Count Register channel 1/5
|-
0x04 0xc4 Address Register channel 2/6
| 0x83
0x05 0xc5 Count Register channel 2/6
| channel 1
0x06 0xc6 Address Register channel 3/7
|-
0x07 0xc7 Count Register channel 3/7
| 0x81
| channel 2
|-
| 0x82
| channel 3
|}
</div>


<div style="float: left; margin: 5px">
PAGE REGISTERS
{| {{wikitable}}
0x87 channel 0 0x8f channel 4 // useless due to cascading
| 0x8F
0x83 channel 1 0x8b channel 5
0x81 channel 2 0x89 channel 6
| channel 4 (useless due to cascading)
|-
0x82 channel 3 0x8a channel 8
| 0x8B
</pre>
| channel 5
|-
| 0x89
| channel 6
|-
| 0x8A
| channel 8
|}
</div>
<div style="clear:both">&nbsp;</div>
</div>


For the pedantic among you, the page and address registers are actually used (not in modern PC's). These registers hold the next DRAM memory address to be refreshed. The reason it is useless today is because the registers can only address 16 MB of memory with these registers, and the CPU needs to be involved to 'up' the page register per 64 KB, which it can't because the DMA controller has no ability to inform the CPU it needs to be updated, because IBM forgot to connect the pins of the DMA controller to interrupts because they cost-cut on building the original PC motherboard! (Please don't feed that sentence through a grammar checker.)
For the pedantic among you, the page and address registers are actually used (not in modern PCs). These registers hold the next DRAM memory address to be refreshed. The reason it is useless today is because the registers can only address 16 MB of memory with these registers, and the CPU needs to be involved to 'up' the page register per 64 KB, which it can't because the DMA controller has no ability to inform the CPU it needs to be updated, because IBM forgot to connect the pins of the DMA controller to interrupts because they cost-cut on building the original PC motherboard! (Please don't feed that sentence through a grammar checker.)


'' Please notice that the number of bytes transferred is one more than the count programmed for the channel. So programming count=0 will transfer one byte, while programming count=0xffff will transfer full 64k. ''
'''Note:''' That the number of bytes transferred is one more than the count programmed for the channel. So programming count = 0 will transfer one byte, while programming count = 0xffff will transfer the full 64k.


==[[Floppy Driver|Floppy Disk]] DMA Programming==
== [[Floppy Driver|Floppy Disk]] DMA Programming ==


After all that ranting, the next section may come as something of a let down. Firstly - you need implement
After all that ranting, the next section may come as something of a let down. Firstly - you need implement only 3 routines to perform a DMA transfer. The example I am using is the Floppy Drive controller (probably the most common followed Sound Blaster).
only 3 routines to perform a DMA transfer. The example I am using is the Floppy Drive controller (probably the most common followed Sound Blaster).


So putting it all together - let's set up a DMA buffer for channel 2 (the floppy disk).
So putting it all together - let's set up a DMA buffer for channel 2 (the floppy disk).


<pre>
<pre>
initialize_floppy_DMA:
initialise_floppy_DMA:
; set DMA channel 2 to transfer data from 0 - 64k in memory
; set DMA channel 2 to transfer data from 0 - 64k in memory
; paging must map this PHYSICAL memory elsewhere and LOCK it from paging to disk!
; paging must map this _physical_ memory elsewhere and _pin_ it from paging to disk!
; set the counter to 0x23ff, the length of a track on a 1.44Mbyte floppy (assuming 512 byte sectors)
; set the counter to 0x23ff, the length of a track on a 1.44 MiB floppy (assuming 512 byte sectors)
; the length of transfer is programmed count+1 (check datasheet if you want)
; the length of transfer is programmed count+1 (check the data sheet if you want)
out 0x0a, 0x06 ; mask DMA channel 2
out 0x0a, 0x06 ; mask DMA channel 2
out 0xd8, 0xFF ; reset the master flip-flop
out 0xd8, 0xFF ; reset the master flip-flop
out 0x04, 0 ; address to 0 (low byte)
out 0x04, 0 ; address to 0 (low byte)
out 0x04, 0 ; address to 0 (high byte)
out 0x04, 0 ; address to 0 (high byte)
out 0xd8, 0xFF ; reset the master flip-flop (again!!!)
out 0xd8, 0xFF ; reset the master flip-flop (again!!!)
out 0x05, 0xFF ; count to 0x23ff (low byte)
out 0x05, 0xFF ; count to 0x23ff (low byte)
out 0x05, 0x23 ; count to 0x23ff (high byte),
out 0x05, 0x23 ; count to 0x23ff (high byte),
out 0x81, 0 ; external page register to 0 for total address of 00 00 00
out 0x81, 0 ; external page register to 0 for total address of 00 00 00
out 0x0a, 0x02 ; unmask DMA channel 2
out 0x0a, 0x02 ; unmask DMA channel 2
ret
ret
</pre>
</pre>


Please note: If you wanted to be stingy about memory, you can use any of the memory above the amount of the maximum transfer allowed (0x2400 in this case)
Please note: if you wanted to be stingy about memory, you can use any of the memory above the amount of the maximum transfer allowed (0x2400 in this case). Once you have set up your addressing and amount of transfer you do not need to touch it again, to select reading or writing you use the mode register.
Once you have set up your addressing and amount of transfer you do not need to touch it again, to select reading or writing you use the mode register.


<pre>
<pre>
prepare_for_floppy_DMA_write:
prepare_for_floppy_DMA_write:
out 0x0a, 0x06 ; mask DMA channel 2
out 0x0a, 0x06 ; mask DMA channel 2
out 0x0b, 0x5A ; 01011010
out 0x0b, 0x5A ; 01011010
; single transfer, address increment, autoinit, write, channel2)
; single transfer, address increment, autoinit, write, channel2)
out 0x0a, 0x02 ; unmask DMA channel 2
out 0x0a, 0x02 ; unmask DMA channel 2
ret
ret


prepare_for_floppy_DMA_read:
prepare_for_floppy_DMA_read:
out 0x0a, 0x06 ; mask DMA channel 2
out 0x0a, 0x06 ; mask DMA channel 2
out 0x0b, 0x56 ; 01010110
out 0x0b, 0x56 ; 01010110
; single transfer, address increment, autoinit, read, channel2)
; single transfer, address increment, autoinit, read, channel2)
out 0x0a, 0x02 ; unmask DMA channel 2
out 0x0a, 0x02 ; unmask DMA channel 2
ret
ret
</pre>
</pre>


The above uses single transfer only for compatibility, but during the init of you floppy driver if
The above uses single transfer only for compatibility, but during the initialization of your floppy driver if you detect a different floppy controller (using the init command), demand transfer should be used to reduce overhead. In this case for a read the:
you detect a different floppy controller (using the init command), demand transfer should be used to
reduce overhead. In this case for a read the


out 0x0b, 0x56
out 0x0b, 0x56


would change to:

would change to


out 0x0b, 0x16.
out 0x0b, 0x16.


=== Usage ===


;Initialization
;To use -
During the initialization of your floppy driver:

During the initialization of your floppy driver


:Map enough physical memory to virtual memory for the size of transfer you are going to do
:(NO MORE THAN 64K, ALL TRANSFERS TO BE DONE IN 64K CHUNKS)
:(MEMORY MUST BEGIN ON A 64K BOUNDARY FROM BETWEEN 0 and 16MBytes)
:Lock the pages allocated so youd don't get page faults during reads or writes
:Call initialise_floppy_DMA altered with suitable values


* Map enough physical memory to virtual memory for the size of transfer you are going to do.
* ''(No more than 64k, all transfers to be sone in 64k chunks.)''
* ''(Memory must begin on a 64k boundary from between 0 and 16 MiB.)''
* Pin the pages allocated so you don't get page faults during reads or writes.
* Call initialize_floppy_DMA altered with suitable values.


;For a write
;For a write
:Call prepare_for_floppy_DMA_write
* Call prepare_for_floppy_DMA_write.
:Copy 18 sectors worth of information to your DMA buffer
* Copy 18 sectors worth of information to your DMA buffer.
:Send a write command to the floppy controller for 18 sectors (say track 1)
* Send a write command to the floppy controller for 18 sectors (say track 1).
:The floppy will now write 18 sectors of information!
* The floppy will now write 18 sectors of information!
:Copy 18 sectors worth of information to your DMA buffer
* Copy 18 sectors worth of information to your DMA buffer.
:Send a write command to the floppy controller for 18 sectors (say track 2)
* Send a write command to the floppy controller for 18 sectors (say track 2).


Note that there is no need to fiddle with the DMA addressing - it automatically resets itself to 0x0 ready for another transaction.
Note that there is no need to fiddle with the DMA addressing - it automatically resets itself to 0x0 ready for another transaction.


;For a read
;For a read
:Call prepare_for_floppy_DMA_read
* Call prepare_for_floppy_DMA_read.
:Send a read command to the floppy controller for 18 sectors (say track 1)
* Send a read command to the floppy controller for 18 sectors (say track 1).
:Multiply the sector you wanted by 512 and use that as an offset into the DMA buffer.
* Multiply the sector you wanted by 512 and use that as an offset into the DMA buffer..
* On the next read you can see if the track you want is in the buffer already and go straight to it. (This is called track buffering.)

:On the next read you can see if the track you want is in the buffer already and go straight to it.(This is called track buffering.)


The good thing about this setup is that it is simple and fast. There is no need to set up a track caching scheme and associated code (apart from checking if the current track number is in the cache).
The good thing about this setup is that it is simple and fast. There is no need to set up a track caching scheme and associated code (apart from checking if the current track number is in the cache).


I wouldn't advise trying to improve the speed of floppy access any further, the floppy disk is so slow compared to the rest of the computer another 1 millisecond of your time is going to be unnoticable. Perhaps the only real improvement that could be made is to read in track 0 on the first read of a floppy disk and buffer the FAT (File Allocation Table) or Inode information. This would only be useful if you were never going to write to the floppy, as the first append to a file will force the OS to update the FAT in memory, copy the altered track 0 to the DMA buffer and perform a floppy write.
I wouldn't advise trying to improve the speed of floppy access any further, the floppy disk is so slow compared to the rest of the computer another 1 millisecond of your time is going to be unnoticeable. Perhaps the only real improvement that could be made is to read in track 0 on the first read of a floppy disk and buffer the FAT (File Allocation Table) or Inode information. This would only be useful if you were never going to write to the floppy, as the first append to a file will force the OS to update the FAT in memory, copy the altered track 0 to the DMA buffer and perform a floppy write.
Putting it simply, there's no real way to improve the write-performance of a floppy disk unless you are writing entire tracks of information at once and aiming for the fastest floppy formatting time record!
Putting it simply, there's no real way to improve the write-performance of a floppy disk unless you are writing entire tracks of information at once and aiming for the fastest floppy formatting time record!


==In Conclusion==
== In Conclusion ==


As I've (constantly) pointed out the DMA chip has lot's of useful features. The main point in programming the chip is remembering the features that '''DON'T''' work. Regardless of the problems, the less the DMA chip is used the better because of it's negative impact on system performance (4.77Mhz, 8-bit data transfer). USB floppy disk drives are now available and standard interface floppy disk drives are only an option on some systems. The future will only make this more so.
As I've (constantly) pointed out the DMA chip has lot's of useful features. The main point in programming the chip is remembering the features that '''don't''' work. Regardless of the problems, the less the DMA chip is used the better because of it's negative impact on system performance (4.77 MHz, 8-bit data transfer). USB floppy disk drives are now available and standard interface floppy disk drives are only an option on some systems. The future will only make this more so.


== References ==
== References ==
=== External Links ===

* [http://www.stud.fh-hannover.de/~heineman/extern/231466.pdf Intel 8237A datasheet]
* [http://www.stud.fh-hannover.de/~heineman/extern/231466.pdf Intel 8237A datasheet]



Revision as of 21:30, 31 October 2008

ISA DMA (Industry Standard Architecture Direct Memory Access) in today's PC architecture, like ISA itself, is in many ways like an appendix. It is used by the floppy disk controller, ISA sound cards, ISA network cards and parallel ports (if they support ECP mode). Whilst interrupt, keyboard and timer interface circuits have obvious and relevant uses, the ISA DMA controller and it's programming interface are still well and truly stuck in the 1970's where they were first designed. Modern PCI controllers typically have their own 'bus mastering', which is far better.

The idea behind DMA is that you can set up a 'channel' with an address pointing into memory and the length of the data to be transferred. Once set up, you can tell the peripheral owning the 'channel' to do whatever it is supposed to do (e.g. read a sector). Then the CPU can go onto something else. Periodically when the data bus isn't being used by the CPU the DMA chip takes over and transfers data between the peripheral and memory without involving the CPU. When the transfer is complete (e.g. an entire sector has been sent to the floppy drive) the DMA chip will than signal that it is finished. The DMA chip can even signal if it has run out of data, allowing the CPU to program some new information and another transaction to proceed. DMA can improve the speed of a system quite a bit and was borrowed by Intel (who designed the DMA controller chip) from the old 1960's mainframes which had DMA channels for all devices (CPU's weren't all on a single chip and very slow back then).

Of course all good ideas can have downsides and while Intel can't really be blamed for what is about to be described, IBM certainly can.

DMA Genesis, Chapter 1, Verse 1

In the beginning there was a PC, but the PC was slow. IBM looked down from the heavens and said "Slap on a DMA controller - that should speed it up." IBM's heart was in the right place, its collective brains were elsewhere as the DMA controller never met the needs of the system. The PC/AT standard contains 2 Intel 8237A DMA chips, connected as Master/Slave. The second chip is Master, and its first line (Channel 4) is used by the first chip, which is Slave. (This is unlike the interrupt controller, where the first chip is Master.) The 8237A was designed for the old 8080 8-bit processor and this is probably the main reason for so many DMA problems. 8088 and 8086 processors chosen by IBM for its PC were too advanced for the DMA controller.

Each 8237 DMA chip has 4 DMA channels. DMA0-DMA3 on the first chip and DMA4-DMA7 on the second. The first DMA controller is wired up for 8 bit transfers, while the second is wired up for 16 bit transfers.

DMA Channel 0 is unavailable as it was used for memory refresh, and remains reserved because of this (even though modern computers don't use it). DMA channel 4 cannot be used for peripherals because it is used for cascading the other DMA controller.

DMA Channels 1, 2 and 3 can only transfer 64 KB of information at a time (the internal registers are only 16 bit). In order to extend this, IBM added an external page register allowing access to 16 MB of memory (24 bits total). If a DMA transfer crosses a 64 KB boundary, the internal count register wraps around to zero, the external page register is not incremented. The DMA controller will happily provide the peripheral attached with whatever it finds at the new address. As a rule 8 bit DMA transfers must not cross 64 KB boundaries and must never be bigger than 64 KB.

DMA Channels 5, 6 and 7 transfer data 16 bits at a time, by shifting addresses one bit to the right and clearing the lower bit. The addressing is exactly the same except you can now start on 128 KB boundaries and transfer up to 128 KB of information at a time before the DMA boldly transfers what has never been transferred before...

Previously it was mentioned that the DMA controller is able to signal completion and even ask for more information. Unfortunately this would make expansion slots too big, so IBM left all of the connections to the DMA chips off. The only time you know when a transfer is complete is for a peripheral to signal an interrupt. This implies that all peripherals using a DMA channel are limited to no more than 64/128 KB transfers for fear of upsetting the DMA controller.

Finally, all DMA controllers run at 4 MHz. (Information taken from ISA Specification) No exceptions. Ignore that GHz tag on your processor or the fact that even PCI runs at 33 MHz. DMA controllers are fixed at this rate. This includes EISA and PS/2 32-bit DMA controllers. The only differences between these and the stock ISA DMA controllers is a extra page register allowing for 64k transfers anywhere in 4 GiB of space and the ability to do 32-bit transfers, and these DMA controllers exists only on EISA and MCA systems, which are now obsolete and are not described here.

So after reading all the above, the main points are:

  • DMA channel 1, 2 and 3 are for 8 bit transfers;
  • DMA channel 5, 6 and 7 are 16 bit transfers;
  • Transfers shouldn't cross 64 KB or 128 KB boundaries;
  • Transfers can only be from the lowest 16 MB of memory;
  • DMA is slow - theoretically 4 MB/second but more like 500 KB/second due to ISA bus protocols.

Note that:

  • Sound Blaster and Sound Blaster PRO only support 8 bit DMA;
  • Sound Blaster 16+ supports both;
  • Floppy disk controllers only support 8 bit DMA and usually use DMA Channel 2.

Busmaster DMA

The IBM/XT used DMA channel 3 for hard disk connections but instead of using DMA the IBM/AT used direct writing to the hard disk using the processor, this was because of the 128 KB limitations outlined above and the fact that the 286 processor could perform 16 bit transactions at 6 MHz. Even the ISA bus could run at a speed of up to 12 MHz, far faster than the 4.77 MHz the DMA controller was running at.

Expansion card designers were also upset with DMA's lack of capabilities, noticeably 'Hard-Card' hard disk expansion card manufacturers who depended on the speed of data transfer.

To get around the limitations of the 'on board' DMA controller, expansion card manufacturers began to put their own DMA controllers on their expansion cards. The controllers didn't have any of the design 'features' of the 8237A and could address all 16 MiB of ISA addressable memory with the full 16 bit data bus width. They functioned exactly the same way as the 'on board' DMA, 'stealing bus cycles' when the processor wasn't looking and thus improving the performance of the system as a whole.

Busmaster DMA is probably easier to understand than normal DMA. Provided you know what registers are on your expansion card, you can send them the address and the amount of data to transfer. The expansion card will send you an interrupt upon completion. PCI Bus Masters can of course access memory with 32 bit addressing. Newer PCI cards are starting to support 64 bit addressing (although at the moment most don't). Typically PCI cards use 'scatter-gather' bus mastering, where one page is used as a directory of data pages. This is makes them especially suitable for paging operating systems (where contiguous pages are rare).

ISA Bus Masters are still limited to the lower 16 MiB of memory - this is an ISA limitation.

Programming the PC DMA controllers

Paging

As this article assumes you are writing your own OS, the subject of memory management is very important unless you are writing a 286 based operating system. This is because as mentioned earlier, all DMA transactions must occur in the lower 16 MiB of memory. (Yes, I'm repeating myself - memorize - it is crucial.)

To be more specific all DMA transactions must occur in physical memory. The DMA controller only functions with physical memory, if you pass it a virtual address happily mapped into anywhere in memory, the DMA controller will happily ignore your carefully constructed page mapping and access whatever it decides to. VM86 mode does not use physical addressing. The physical addresses are fake. In VM86 mode the OS must emulate any DMA transactions on behalf of an application, probably in a totally non-DMA way. Almost all DOS memory managers and operating systems that uses V86 mode to run DOS applications (including Windows 3.x enhanced mode and EMM386) supply virtual DMA functions, but these are basically the same, the V86 monitor 'fakes' the DMA transaction. DMA has no knowledge of paging whatsoever and I should point out that this includes Bus Master DMA.

Finally (and this is a great laugh when it happens) don't forget to 'pin' the pages in memory. Imagine the floppy drive trying to read data from memory and you swap that data out to the hard drive. The DMA chip doesn't know - "Paging? Whats that then? Dunno. You've just mapped in the password file? No problem, I'll write that out to the floppy instead." DMA memory is like Memory Mapped I/O - do not move - do not page. (I apologise for sounding really dictatorial, it's just that DMA is so easy to screw up. Lots' more things to screw up later! - just wait till we get to commands)

One way to handle this situation is to manage memory below 16 MB using a 'bitmap' so that contiguous pages that don't cross 64 KB or 128 KB boundaries can be found, and use free page stacks (or anything else) for memory above 16 MB.

An example for getting it right:

  • When your OS starts it detects (or is told of) a device requiring DMA memory.
  • Map the lowest 64K of physical memory elsewhere in memory and aim the DMA channel you are setting up at address 0.
  • Set up the DMA channel for the correct type of transaction (more on this later).
  • Next DMA Channel, map the next 64K of physical memory somewhere, aim your DMA channel at address 64K.
  • Set up the DMA channel for the correct type of transaction (more on this later).
  • Complete building your pages stacks / bitmaps whatever and add any lower memory left over.

Apart from making sure that you have a copy somewhere of the information in the first 4K of memory and the Extended BIOS Data Area (these are for VM86 tasks if you decide to add BIOS/DOS/VESA/PCI support later), the above example will allow you to have a flat paged memory area from 0 to forever while still allowing you OS to use old DMA dependent peripherals. The device drivers work with CPU paged memory areas, while the DMA controller is happily working with low memory under 16 MiB. Have the driver write data to the paging relocated area, send commands to the peripheral and the DMA chip will access physical memory correctly when asked to do so.

Registers

Each 8237A has 5 registers, addressed via I/O space:

8237A No. 1 8237A No. 2
Address Address Read or Write Function
0x08 0xD0 R Status Register
0x08 0xD0 W Command Register
0x09 0xD2 W Request Register
0x0A 0XD4 W DMA Channel Mask Register
0x0B 0xD6 W DMA Mode Register
0x0C 0XD8 R Byte / Word Register
0x0D 0xDA R Intermediate Register
0x0F 0XDE W DMA Mask Register

Each DMA Channel has an External Page Address Register that is added to the upper 8 bits of the DMA transfer address:

0x87 DMA Channel 0 Page Address Register (bits A17 - A24)
0x83 DMA Channel 1 Page Address Register (bits A17 - A24)
0x82 DMA Channel 2 Page Address Register (bits A17 - A24)
0x81 DMA Channel 3 Page Address Register (bits A17 - A24)
0x8F DMA Channel 4 Page Address Register (bits A17 - A24)
0x8B DMA Channel 5 Page Address Register (bits A17 - A24)
0x89 DMA Channel 6 Page Address Register (bits A17 - A24)
0x8A DMA Channel 7 Page Address Register (bits A17 - A24)

(Bit Patterns from Indispensable PC Hardware Book - I'm commenting the useful parts i.e. the bits you can actually use, a lot of the 'features' of the 8237 do not function in the PC and the fact they do not work introduces new 'features'. Please assume that if anything isn't explicitly mentioned, it implicitly does not function. Messing with things not mentioned will usually result in screwing up of your computer. Don't take my word for it (I could be Kermit the Frog for all you know!) so feel free to ask other people and do whatever makes you feel good.)

Status Registers 0x08 and 0xD0 (Read)
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
REQ3 REQ2 REQ1 REQ0 TC3 TC2 TC1 TC0
  • REQ3-0: When set: DMA Request Pending.
  • TC3-0: When set: Transfer Complete.

This register isn't very important in light of the fact that the 8237 can't send an IRQ to tell you that it has finished. Usually there is no need to poll this register as again the peripheral owner will send an interrupt when a transaction has finished and polling would be a total waste of time.


Command Registers 0x08 and 0xD0 (Write)
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
DACKP DRQP EXTW PRIO COMP COND ADHE MMT

This register really shows how incompatible the 8237 is with the PC hardware.

  • Let's start with EXTW and COMP. These increase the speed of DMA transfer by 25% be removing one of the clock cycles. Does it work? No.
  • PRIO. When zeroed, this allows DMA priorities to be rotated allowing freedom and liberty for all peripherals that share the data bus. Does it work? No.
  • MMT and ADHE. Did you know that the IBM PC could do memory to memory transfers since 1981? Thats right, hardware sprites, hardware frame buffering from one location to another. Does it work? No.
  • COND. Hooray the only bit in the control register that does something useful. Setting this bit disables the DMA controller. This lets you set up multiple DMA channels without masking each and every channel.
Request Registers 0x09 and 0xD2 (Write)

Used for memory to memory transfers and setting up priority rotation- absolutely useless.

DMA Channel Mask Registers 0x0A and 0xD4 (Write)
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
STCL SEL1 SEL 0

To mask a channel for programming/whatever set STCL to 1 and SEL0 and SEL1 to select the channel to be masked.

  • Sending 0x4(100b) would mask DMA Channel 0 of the appropriate 8237.
  • Sending 0x0(000b) would unmask DMA Channel 0 of the appropriate 8237.
  • To unmask a channel set STCL to 0.

This register only allows you to mask/unmask a single DMA channel. If you need to mask multiple DMA channels, you need to use the following:

DMA Mask Registers 0x0F and 0xDE (Write)
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
STC3 STC2 STC1 STC0

Setting the appropriate bits to 0 or 1 allows you to mask or unmask those channels. DMA channels 3, 2, 1, 0 or 7, 6, 5, 4. Note that masking DMA channel 4 will mask 7, 6, 5 and 4 due to cascading.

DMA Mode Registers 0x0B and 0xD6 (Write)
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
MOD1 MOD0 IDEC AUTO TRA1 TRA0 SEL1 SEL0

This register is tricky as it depends highly on the peripheral you are programming the DMA controller for.

  • SEL0 and SEL1 select the channel you want to change;
  • TRA0 and TRA1 selects the transfer type;
    • 0b00 runs a self test of the controller;
    • 0b01 DMA Channel is for writing to memory;
    • 0b10 DMA Channel is for reading from memory;
    • 0b11 invalid.
  • AUTO: When this bit is set, after a transfer has completed the channel resets itself to the values you programmed into it. This is great for floppy transfers. Read in a track - the values set themselves up for reading again immediately. For writing you'd only need to alter the transfer mode - not the addresses. Some expansion cards do not support auto-init DMA such as Sound Blaster 1.x. These devices will crash if used with auto-init DMA. Sound Blaster 2.0 and later do support auto-init DMA.
  • IDEC: Increment/Decrement. I guess if you wanted to write big endian data to a floppy drive you could. Big-endian processor based computers wouldn't be using an 8237.
  • MOD0 and MOD1: This is where some problems can arise based on the peripheral the DMA controller is attached to. The DMA controller has several modes:
    • 0b00 = Transfer on Demand;
    • 0b01 = Single DMA Transfer;
    • 0b10 = Block DMA Transfer;
    • 0b11 = Cascade Mode (use to cascade another DMA controller).

Single transfer mode is good for peripherals than cannot cache a lot of data at once. A good example is early floppy controllers that have only a very small buffer. The DMA controller and CPU can share the data bus happily. Sound Blaster and Sound Blaster Pro use software or Single Transfer DMA Mode.

Block transfer mode is good for peripherals that can buffer entire blocks of information. A good example of this is an ESDI or SCSI controller without Bus Master DMA.

Demand transfer mode is good for peripherals that start and stop intermittently such as a tape drive. The drive can read a whole load of information for as long as it can and the suspend the transfer to move to another section of the tape. Newer floppy controllers also work well with demand transfer because they have FIFO buffers to store information being read and written. The peripheral controls the flow, and as the information flow is uninterrupted, performance can be gained. CPU's in these later computers generally have caches and can continue working uninterrupted during a demand DMA transfer. Older computers will slow down as their CPU's wait for the data bus to become available.

In addition to the above registers the 8237 has three command registers:

The first command register is 0xd8 which is called the 'reset flip-flop' register. The DMA controller is an 8-bit chip. In order to write 16-bit values into an address register or count register you must write two 8 bit values appropriate register address (to follow). Imagine however that you only write one 8 bit value and the the next time try to write a 16 bit value. The DMA chip will be expecting the high byte, not the low byte. Clearing the flip-flop guarantees getting the address into the DMA controller in the right order. To reset the flip-flop write anything to the register :

ASM
out 0xd8, 0xFF
C
void reset_flipflop_DMA()
{
    outb(0xdb, 0xFF);
}

Another dictate - the reset flip-flop command must be sent before any 16 bit transaction. The flip-flop does not reset after the DMA controller has received the second byte. Easily missed, can cause many problems.

The second command register is the 'master reset and clear'. Writing anything to this register (0x0d) resets all count, address and mask values ready for you to program.

ASM
out 0x0d, 0xFF
;reinitialize all DMA channels now
C
void hard_reset_DMA()
{
    outb(0x0d, 0xFF);
}

The third command register is the 'clear mask register'. Writing anything to this register (0xDC) will clear the mask register (wow!) releasing all DMA channels to accept DMA requests.

ASM
out 0xdc, 0xFF
; reinitialize all DMA channels now
C
void unmask_all_DMA()
{
    outb(0xdc, 0xFF);
}

Address and Count Registers

Each DMA chip has 4 16-bit address registers, 4 16-bit count registers and 4 external 8-bit page registers. As previously mentioned a DMA controller cannot address or transfer more than 65536 bytes or words of information at a time. The page registers can allow access to the lower 16 MiB of memory but are not incremented when crossing a 64k boundary. The easiest way is to make sure the address you supply in the address register is zero and that any addressing that you program the DMA controller to do is done using the external page register. It seems stupid, but unless you know that your transfer will not be over 64k it is the safest.

Example page register settings:

Page register Physical address
0 0
1 0x10000 (64k)
2 0x20000 (128k)
. .
. .
. .
etc... etc...

Make sure that you map the physical address of your DMA buffer somewhere in memory if using paging. This buffer can then be referred to by it's virtual memory address. Mark the memory as non pageable. Even though it's not memory mapped I/O, it is, except that isn't. Think of it as borrowing memory from the system because the expansion card manufacturer was too stingy to supply adequate buffering memory on their card.

OK, the I/O port addresses:

DMA 1 DMA 2 Register name
0x00 0xC0 Address Register channel 0/4
0x01 0xC1 Count Register channel 0/4
0x02 0xC2 Address Register channel 1/5
0x03 0xC3 Count Register channel 1/5
0x04 0xC4 Address Register channel 2/6
0x05 0xC5 Count Register channel 2/6
0x06 0xC6 Address Register channel 3/7
0x07 0xC7 Count Register channel 3/7
Page Registers
0x87 channel 0
0x83 channel 1
0x81 channel 2
0x82 channel 3
0x8F channel 4 (useless due to cascading)
0x8B channel 5
0x89 channel 6
0x8A channel 8
 

For the pedantic among you, the page and address registers are actually used (not in modern PCs). These registers hold the next DRAM memory address to be refreshed. The reason it is useless today is because the registers can only address 16 MB of memory with these registers, and the CPU needs to be involved to 'up' the page register per 64 KB, which it can't because the DMA controller has no ability to inform the CPU it needs to be updated, because IBM forgot to connect the pins of the DMA controller to interrupts because they cost-cut on building the original PC motherboard! (Please don't feed that sentence through a grammar checker.)

Note: That the number of bytes transferred is one more than the count programmed for the channel. So programming count = 0 will transfer one byte, while programming count = 0xffff will transfer the full 64k.

Floppy Disk DMA Programming

After all that ranting, the next section may come as something of a let down. Firstly - you need implement only 3 routines to perform a DMA transfer. The example I am using is the Floppy Drive controller (probably the most common followed Sound Blaster).

So putting it all together - let's set up a DMA buffer for channel 2 (the floppy disk).

initialize_floppy_DMA:
; set DMA channel 2 to transfer data from 0 - 64k in memory
; paging must map this _physical_ memory elsewhere and _pin_ it from paging to disk!
; set the counter to 0x23ff, the length of a track on a 1.44 MiB floppy (assuming 512 byte sectors)
; the length of transfer is programmed count+1 (check the data sheet if you want)
    out 0x0a, 0x06      ; mask DMA channel 2
    out 0xd8, 0xFF      ; reset the master flip-flop
    out 0x04, 0         ; address to 0 (low byte)
    out 0x04, 0         ; address to 0 (high byte)
    out 0xd8, 0xFF      ; reset the master flip-flop (again!!!)
    out 0x05, 0xFF      ; count to 0x23ff (low byte)
    out 0x05, 0x23      ; count to 0x23ff (high byte),
    out 0x81, 0         ; external page register to 0 for total address of 00 00 00
    out 0x0a, 0x02      ; unmask DMA channel 2
    ret

Please note: if you wanted to be stingy about memory, you can use any of the memory above the amount of the maximum transfer allowed (0x2400 in this case). Once you have set up your addressing and amount of transfer you do not need to touch it again, to select reading or writing you use the mode register.

prepare_for_floppy_DMA_write:
    out 0x0a, 0x06      ; mask DMA channel 2
    out 0x0b, 0x5A      ; 01011010
                        ; single transfer, address increment, autoinit, write, channel2)
    out 0x0a, 0x02      ; unmask DMA channel 2
    ret

prepare_for_floppy_DMA_read:
    out 0x0a, 0x06      ; mask DMA channel 2
    out 0x0b, 0x56      ; 01010110
                        ; single transfer, address increment, autoinit, read, channel2)
    out 0x0a, 0x02      ; unmask DMA channel 2
    ret

The above uses single transfer only for compatibility, but during the initialization of your floppy driver if you detect a different floppy controller (using the init command), demand transfer should be used to reduce overhead. In this case for a read the:

out 0x0b, 0x56

would change to:

out 0x0b, 0x16.

Usage

Initialization

During the initialization of your floppy driver:

  • Map enough physical memory to virtual memory for the size of transfer you are going to do.
  • (No more than 64k, all transfers to be sone in 64k chunks.)
  • (Memory must begin on a 64k boundary from between 0 and 16 MiB.)
  • Pin the pages allocated so you don't get page faults during reads or writes.
  • Call initialize_floppy_DMA altered with suitable values.
For a write
  • Call prepare_for_floppy_DMA_write.
  • Copy 18 sectors worth of information to your DMA buffer.
  • Send a write command to the floppy controller for 18 sectors (say track 1).
  • The floppy will now write 18 sectors of information!
  • Copy 18 sectors worth of information to your DMA buffer.
  • Send a write command to the floppy controller for 18 sectors (say track 2).

Note that there is no need to fiddle with the DMA addressing - it automatically resets itself to 0x0 ready for another transaction.

For a read
  • Call prepare_for_floppy_DMA_read.
  • Send a read command to the floppy controller for 18 sectors (say track 1).
  • Multiply the sector you wanted by 512 and use that as an offset into the DMA buffer..
  • On the next read you can see if the track you want is in the buffer already and go straight to it. (This is called track buffering.)

The good thing about this setup is that it is simple and fast. There is no need to set up a track caching scheme and associated code (apart from checking if the current track number is in the cache).

I wouldn't advise trying to improve the speed of floppy access any further, the floppy disk is so slow compared to the rest of the computer another 1 millisecond of your time is going to be unnoticeable. Perhaps the only real improvement that could be made is to read in track 0 on the first read of a floppy disk and buffer the FAT (File Allocation Table) or Inode information. This would only be useful if you were never going to write to the floppy, as the first append to a file will force the OS to update the FAT in memory, copy the altered track 0 to the DMA buffer and perform a floppy write. Putting it simply, there's no real way to improve the write-performance of a floppy disk unless you are writing entire tracks of information at once and aiming for the fastest floppy formatting time record!

In Conclusion

As I've (constantly) pointed out the DMA chip has lot's of useful features. The main point in programming the chip is remembering the features that don't work. Regardless of the problems, the less the DMA chip is used the better because of it's negative impact on system performance (4.77 MHz, 8-bit data transfer). USB floppy disk drives are now available and standard interface floppy disk drives are only an option on some systems. The future will only make this more so.

References

External Links