User:Greasemonkey/Intel GenX: Difference between revisions

(→‎Getting a display: Gen6-compatible code, and a few fixes)
 
(8 intermediate revisions by the same user not shown)
Line 1:
The Intel GenX GPU architecture (probably) covers the Intel HD and the later Intel GMA series of GPUs, starting with the Intel 965. Later revisions tend to add and remove and readd and relocate features and instructions over time, but otherwise remain fairly similar.
 
This will cover Gen4 (963, 965, G35) and later. Earlier GPUs do not have open documentation and are apparently very different.
Line 22:
GMAs tend to be advertised as at least two functions, so you may wish to check Function 1 as well. HDs tend to be advertised as just one.
 
Either way, there appears to be no difference between using BAR0 of Function 0 and BAR0 of Function 1, and only Function 0 gives thea initialwindow [[AGP]]into aperturestolen space informationmemory, so you might be able to get away with just using Function 0.
 
With that said, make sure you check the IDs to ensure that they match hardware that you've actually tested. Different GPUs have different bugs and thus require different workarounds - even when they're within the same family.
Line 28:
Here's the information, assuming you are accessing everything through Function 0:
* uint64_t @ PCI 0x10: Location of MMIO registers. (XXX: for devices with a Function 1, this appears to do more than just MMIO registers - Function 1 should be able to provide a space with "just" MMIO registers.)
* uint64_t @ PCI 0x18: Location of AGP stolen memory location.
 
The 64-bit pointers have the lower 4 bits set to 0x4, so remember to mask it out before you use it, and remember to maintain that mask before writing back.
Line 103:
</pre>
 
With the above config, your framebuffer should be located at the start of thestolen AGP aperturememory, and be 2048 32bpp BGRX pixels wide internally.
 
===Setting monitor timings===
Line 133:
genx_reg32[PIPEnSRC(genx_pipe)] = ((vis_stretch_w-1)<<16)|(vis_stretch_h-1);
</pre>
 
==Getting the ring buffer to work==
Tested on:
* GM45 1366x768 CQ60-210TU
* HD3000 1366x768 dv6-6c35tx
 
===References for RING_BUFFER_* registers===
* G45: Vol1a, pg 238
* HD3000: Vol1p3, pg 39
 
===References for commands===
* G45: Vol1b
* HD3000: Vol1p3-5
 
===General notes===
 
The ring buffer is vital for being able to send commands to the GPE (Geometry Processing Engine) and whatnot. Commands are at least 1 DWord long, but the ring buffer indices are aligned to the nearest QWord.
 
Note, the tail indicates the end of the buffer, and is where you write your commands to. The head indicates the start of the buffer, and is where the GPU reads from. While the head is DWord-aligned, the tail is QWord-aligned, so you may need to pad your instructions by inserting a MI_NOOP (0x00000000 will do).
 
It's possible to start the ring buffer and then advance the tail when you have a new command or batch of commands.
 
RING_BUFFER_START denotes the address relative to that stolen memory space.
 
RING_BUFFER_HEAD and RING_BUFFER_TAIL need to be given byte offsets, so if you add, say, two DWords, you'd add 8 to RING_BUFFER_TAIL.
 
Apparently you don't need a GTT to get this working.
 
If you want to check to see if this is working, NOPID is a useful register. The Gen6 docs seem to be missing the location of this register, however it is in the same location as Gen4.5 (0x02094).
 
==Using the blitter==
Tested on:
* GM45 1366x768 CQ60-210TU
 
The HD3000 was tested at some stage but the ring buffer stopped, suggesting an invalid instruction, although there may have also been a GTT issue.
 
===General notes===
 
Firstly you'll want to know your "raster op" modes. They are conceptually the same as the Amiga's modes, although probably with different sources.
 
Here's a list of useful modes:
* 0xF0: Set to pattern (useful for COLOR_BLT)
* 0xCC: Set to source image (useful for SRC_COPY_BLT)
* 0xAA: Set to destination (useful for not much)
* 0x55: Set to opposite of destination (useful for XOR effect)
 
Top bit determines the result when all of {pattern, source, destination} are 1. Bottom bit determines the result when all are 0.
 
The rest is All There In The Manual.
 
'''WARNING:''' When the manual says "pitch in dwords", what they ''really'' mean is "pitch in bytes, aligned to a dword boundary".
 
There are two sets of blitter commands: The XY_ commands, and the other commands. The other commands are a bit simpler, but there are only two of them.
 
*COLOR_BLT fills a rectangular area with a solid colour. Useful for clearing the screen like a boss.
*SRC_COPY_BLT copies from one place in memory to another. If you have no GTT, this will be GPU-to-GPU only. Still faster, easier and more powerful than EGA/VGA.
 
Here's an example of a 32bpp COLOR_BLT used to clear a screen with a nice purple tinge:
<pre>
// COLOR_BLT
genx_rb_push(0
| (0x2<<29) | (0x40<<22)
| (0x3<<20) // a:rgb mask
| 0x03
);
genx_rb_push(0
| (3<<24) // bit depth
| (0xF0<<16) // raster op
| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
);
genx_rb_push(((screen_height)<<16)|((screen_width)<<2)); // height in scanlines, width in bytes
genx_rb_push(screen_stolen_memory_offset);
genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066
</pre>
 
The XY_ commands allow you to specify ranges using X,Y coordinate pairs and apply clipping based on those pairs.
 
Here's the XY_COLOR_BLT version of the above:
<pre>
// XY_COLOR_BLT
genx_rb_push(0
| (0x2<<29) | (0x50<<22)
| (3<<20) // a:rgb mask
| (0<<11) // tiling enable (tile-X only)
| 0x04
);
genx_rb_push(0
| (0<<30) // clipping enable
| (3<<24) // bit depth
| (0xF0<<16) // raster op
| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
);
genx_rb_push((0<<16) | (0)); // Y1:X1 top-left
genx_rb_push(((screen_height)<<16)|((screen_width))); // Y2:X2 bottom-right
genx_rb_push(screen_stolen_memory_offset);
genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066
</pre>
 
Note, if you want clipping, you'll want to run an XY_SETUP_CLIP_BLT command, and then enable the "clipping enable" flag.
 
==Making the GTT behave==
Tested on:
* GM45 1366x768 CQ60-210TU
 
===References for GTT page format===
* G45: Vol1a, pg214
 
===Finding GTTADR===
'''Gen4 only (NOT Gen4.5!)''': 32-bit address "GTTADR" at PCI B0:D2:F0:0x1C. ''(TODO: confirm)''
 
'''Gen4.5 and above''': 64-bit address "GTTMMADR" at PCI B0:D2:F0:0x18 , then add 2MB (0x200000). ''(TODO: confirm the "above" bit)''
tends to use GTTADR from PCI, which is the 32-bit address at PCI+0x1C.
 
===Allocating space for the GTT===
====Early Gen4====
Allocate a block of memory in the stolen memory space. 512KB is the largest you can use for the GTT, and allows for a 512MB virtual addressing space. Ensure that the block of memory is aligned with its size.
 
Once you have it in place, ensure that the graphics pipeline is flushed (if you don't know what this is, it probably already is flushed), then:
 
<pre>
genx_reg32[GFX_FLSH_CNTL] = 0;
genx_reg32[PGTBL_CTL] = 1 | gtt_offset; // GTT: 512KB, enabled
genx_reg32[PGTBL_CTL2] = 0; // disables the PPGTT
genx_reg32[GFX_FLSH_CNTL] = 0;
</pre>
 
We will get to modifying it pretty soon.
 
*Paging type 0 is for stolen memory.
*Paging type 3 is for main CPU memory. The GPU will snoop the cache for you.
 
Note that the actual screen is rendered using physical, unmapped "stolen" memory addresses.
 
Also note that direct access to the stolen memory via GMADR also uses the physical unmapped addresses.
 
====Gen4 "Bearlake-C" (G35?) and onwards====
 
Don't allocate it. The chip allocates it for you. Just leave the upper 31:12 in PGTBL_CTL intact when you mess with it. Once you have identity paging in place, set the lower bit.
 
Of course, if you are paranoid, you can always allocate some memory anyway.
 
Paging types above are as per pre-Gen6. Gen6 has different paging types, apparently.
 
===Identity paging===
 
In this example, genx_gtt32 points to GTTADR as calculated.
 
The GPU will handle all the caching issues for you if you use GTTADR. To add to this, in Gen6 this is the only way to access the GTT, so instead of learning the older method of writing via system RAM and then flushing the GPU's cache, you should just use this instead.
 
<pre>
for(i = 0; i < 512*256; i++)
genx_gtt32[i] = (((i)<<12) | (3<<1) | (1<<0));
</pre>
 
===Memory-to-GPU blit (and vice versa)===
 
This is for a 32bpp blit.
 
blk_(width|height) denotes the size of the blit to perform.
(src|dest)_gtt are addresses that the GPU will feed through the Global GTT or PPGTT
(src|dest)_pitch are the image pitches in DWords.
 
You must ensure that the GTT has the correct paging type for the given GPU for each page that this will need to use. For Gen4, use 0 for stolen memory, and 3 for system memory.
 
<pre>
// SRC_COPY_BLT
genx_rb_push((2<<29) | (0x43<<22)
| (3<<20) // a:rgb mask
| 0x04
);
genx_rb_push(0
| (0<<30) // reverse X direction
| (3<<24) // bit depth
| (0xCC<<16) // raster op
| (dest_pitch*4) // dest pitch in bytes
);
genx_rb_push((blk_height<<16) | (blk_width*4)); // dest dims
genx_rb_push(dest_gtt);
genx_rb_push(0
| (src_pitch*4) // src pitch in bytes
);
genx_rb_push(src_gtt); // src addr
genx_rb_punch();
</pre>
 
XY_SRC_COPY_BLT and whatnot should also work just fine, including with clipping and the like.
 
==See Also==
Line 141 ⟶ 327:
* [https://01.org/linuxgraphics/documentation/driver-documentation-prms Official documentation from Intel]
* [http://www.x.org/docs/intel/ Official documentation on X.org] - covers up to 2012, but also has Vol_1b_G45_core.pdf which is missing from the official Intel PRM list
* Chipset datasheets:
** [http://www.intel.com/Assets/PDF/datasheet/320122.pdf Mobile Intel 4 Series Datasheet] - covers the mobile version of Gen4.5.
* [http://forums.entechtaiwan.com/index.php?topic=2578.0 1366x768 LCD timings] - thread with several different sets of timings
* [http://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units List of Intel graphics processing units] on Wikipedia - useful for finding out PCI device IDs and exactly what generation you're using.
Anonymous user