User:Greasemonkey/Intel GenX: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
Content added Content deleted
(Ring buffer tested on Gen6, appears to actually work)
(blitting; me crying about the GTT not working)
Line 162: Line 162:


If you want to check to see if this is working, NOPID is a useful register. The Gen6 docs seem to be missing the location of this register, however it is in the same location as Gen4.5 (0x02094).
If you want to check to see if this is working, NOPID is a useful register. The Gen6 docs seem to be missing the location of this register, however it is in the same location as Gen4.5 (0x02094).

==Using the blitter==
Tested on:
* GM45 1366x768 CQ60-210TU

The HD3000 was tested at some stage but the ring buffer stopped, suggesting an invalid instruction, although there may have also been a GTT issue.

===General notes===

Firstly you'll want to know your "raster op" modes. They are conceptually the same as the Amiga's modes, although probably with different sources.

Here's a list of useful modes:
* 0xF0: Set to pattern (useful for COLOR_BLT)
* 0xCC: Set to source image (useful for SRC_COPY_BLT)
* 0xAA: Set to destination (useful for not much)
* 0x55: Set to opposite of destination (useful for XOR effect)

Top bit determines the result when all of {pattern, source, destination} are 1. Bottom bit determines the result when all are 0.

The rest is All There In The Manual.

'''WARNING:''' When the manual says "pitch in dwords", what they ''really'' mean is "pitch in bytes, aligned to a dword boundary".

There are two sets of blitter commands: The XY_ commands, and the other commands. The other commands are a bit simpler, but there are only two of them.

*COLOR_BLT fills a rectangular area with a solid colour. Useful for clearing the screen like a boss.
*SRC_COPY_BLT copies from one place in memory to another. If you have no GTT, this will be GPU-to-GPU only. Still faster, easier and more powerful than EGA/VGA.

Here's an example of a 32bpp COLOR_BLT used to clear a screen with a nice purple tinge:
<pre>
// COLOR_BLT
genx_rb_push(0
| (0x2<<29) | (0x40<<22)
| (0x3<<20) // a:rgb mask
| 0x03
);
genx_rb_push(0
| (3<<24) // bit depth
| (0xF0<<16) // raster op
| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
);
genx_rb_push(((screen_height)<<16)|((screen_width)<<2)); // height in scanlines, width in bytes
genx_rb_push(screen_agp_offset);
genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066
</pre>

The XY_ commands allow you to specify ranges using X,Y coordinate pairs and apply clipping based on those pairs.

Here's the XY_COLOR_BLT version of the above:
<pre>
// XY_COLOR_BLT
genx_rb_push(0
| (0x2<<29) | (0x50<<22)
| (3<<20) // a:rgb mask
| (0<<11) // tiling enable (tile-X only)
| 0x04
);
genx_rb_push(0
| (0<<30) // clipping enable
| (3<<24) // bit depth
| (0xF0<<16) // raster op
| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
);
genx_rb_push((0<<16) | (0)); // Y1:X1 top-left
genx_rb_push(((screen_height)<<16)|((screen_width))); // Y2:X2 bottom-right
genx_rb_push(screen_agp_offset);
genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066
</pre>

Note, if you want clipping, you'll want to run an XY_SETUP_CLIP_BLT command, and then enable the "clipping enable" flag.

===Memory-to-GPU blit===

''TODO: Get the GTT working first. Only then will this work.''

''Nevertheless, this is probably faster than a memcpy.''

==Making the GTT behave==

''TODO: Succeed. Right now I'm failing. Miserably.''


==See Also==
==See Also==

Revision as of 06:34, 29 May 2015

The Intel GenX GPU architecture (probably) covers the Intel HD and the later Intel GMA series of GPUs, starting with the Intel 965. Later revisions tend to add and remove and readd and relocate features and instructions over time, but otherwise remain fairly similar.

This will cover Gen4 (963, 965, G35) and later. Earlier GPUs do not have open documentation and are apparently very different.

As there is an awful lot to document, this is probably a better place for tutorials.

This page is a stub.
You can help the wiki by accurately adding more contents to it.

Systems tested

  • Compaq CQ60-210TU: GM45 DevCTG (GMA 4500MHD; Cantiga) with 1366x768 LVDS panel
  • HP Pavilion dv6-6c35tx: i5-2450M DevSNB (HD 3000) with 1366x768 LVDS panel

Anything that doesn't exactly match the specs listed here may differ. Approach at your own risk.

Finding the GPU

Tested on:

  • GM45 1366x768 CQ60-210TU
  • HD3000 1366x768 dv6-6c35tx

Probe the PCI bus. It should be located at Bus 0, Device 2, Function 0.

GMAs tend to be advertised as at least two functions, so you may wish to check Function 1 as well. HDs tend to be advertised as just one.

Either way, there appears to be no difference between using BAR0 of Function 0 and BAR0 of Function 1, and only Function 0 gives the initial AGP aperture space information, so you might be able to get away with just using Function 0.

With that said, make sure you check the IDs to ensure that they match hardware that you've actually tested. Different GPUs have different bugs and thus require different workarounds - even when they're within the same family.

Here's the information, assuming you are accessing everything through Function 0:

  • uint64_t @ PCI 0x10: Location of MMIO registers. (XXX: for devices with a Function 1, this appears to do more than just MMIO registers - Function 1 should be able to provide a space with "just" MMIO registers.)
  • uint64_t @ PCI 0x18: Location of AGP stolen memory location.

The 64-bit pointers have the lower 4 bits set to 0x4, so remember to mask it out before you use it, and remember to maintain that mask before writing back.

Getting a display

Tested on:

  • GM45 1366x768 CQ60-210TU
  • HD3000 1366x768 dv6-6c35tx

Proper mode switch methodology

Consult the appropriate section of your manual. It tends to be in Volume 3 at the start of one of the parts.

  • G45: Vol3, pg 28
  • SNB: Vol3.2, pg 8

Simplified version

This assumes that the BIOS configured the LVDS panel properly and/or in a way that we can just take its values and run with them. If it does, you can avoid an awful lot of pain.

This also assumes that your panel is less than 2048 pixels wide, but that can be fixed by adding a few "if" statements. However, the 1366x768 assumption is dropped from this version.

There is a chance that this will work on systems that don't have an LVDS panel, but you will be limited to whatever resolution the BIOS spews out.

If you want a non-native resolution, enable the panelfitter and adjust PIPEnSRC to suit. Note, the GMA panelfitter does love to blur everything. (HD 3000 seems fine.)

Here's how you get from VGA text mode to native 32bpp full-res BGRX mode the easy way, and by easy we mean this is very much empirical, about as risky, and assumes the BIOS doesn't have stupid bugs in it:

	// Set VGA screen off
	outportb(0x3C4, 0x01);
	outportb(0x3C5, inportb(0x3C5)|(1<<5));

	// Wait at least 100us (Gen4.5 only needs 20us, but Gen6 needs 100us)
	wait_us(100);

	// Get correct VGA pipe
	// WARNING: Gen5 changes the location of this register!
	// Pre-Gen5 uses 0x71400, Gen5+ uses 0x41000.
	int real_VGACNTRL = (genx_typ < 0x5000 ? VGACNTRL : VGACNTRL_ILK);
	genx_pipe = (genx_reg32[real_VGACNTRL]>>29)&1;

	// Disable VGA
	genx_reg32[real_VGACNTRL] |= (1<<31);

	// Disable Display n
	genx_reg32[DSPnCNTR(genx_pipe)] &= ~(1<<31);

	// Set PIPEnSRC to screen resolution
	uint32_t vis_w = (genx_reg32[HTOTAL_n(genx_pipe)] & 0xFFFF)+1;
	uint32_t vis_h = (genx_reg32[VTOTAL_n(genx_pipe)] & 0xFFFF)+1;
	genx_reg32[PIPEnSRC(genx_pipe)] = ((vis_w-1)<<16)|(vis_h-1);

	// XXX: Lacking information on DevSNB's panelfitter.
	// Seems to work fine without disabling it.
	// For Gen4.5, however, disabling it is pretty much mandatory,
	// unless you can't stand GPUs that lack antialiasing.
	if(genx_typ >= 0x5000)
	{
		genx_reg32[PIPEnCONF(genx_pipe)] &= ~(1<<31); // Disable Pipe n
		while((genx_reg32[PIPEnCONF(genx_pipe)] & (1<<30))) {} // Wait for Pipe n to stop

		genx_reg32[PFIT_CONTROL] &= ~(1<<31); // Disable panelfitter

		genx_reg32[PIPEnCONF(genx_pipe)] |= (1<<31); // Enable Pipe n
	}

	// Set up Display n and enable
	genx_reg32[DSPnLINOFF(genx_pipe)] = 0x00000000; // linear offset
	genx_reg32[DSPnSTRIDE(genx_pipe)] = 2048*4; // scanline pitch
	genx_reg32[DSPnSURF(genx_pipe)] = 0x00000000; // surface base
	genx_reg32[DSPnCNTR(genx_pipe)] = (genx_reg32[DSPnCNTR(genx_pipe)] & ~(15<<26)) | (6<<26); // bit depth select, 6 = 32bpp BGRX
	genx_reg32[DSPnCNTR(genx_pipe)] = (genx_reg32[DSPnCNTR(genx_pipe)] & ~(3<<20)) | (0<<20); // pixel multiply
	genx_reg32[DSPnCNTR(genx_pipe)] = (genx_reg32[DSPnCNTR(genx_pipe)] & ~(1<<10)) | (0<<10); // tiling flag
	genx_reg32[DSPnCNTR(genx_pipe)] |= (1<<31); // enable

With the above config, your framebuffer should be located at the start of the AGP aperture, and be 2048 32bpp BGRX pixels wide internally.

Setting monitor timings

Not tested on the dv6-6c35tx.

If you need to set the monitor timings, this code should be useful for a start. These timings work on the CQ60-210TU's surprisingly tolerant LVDS panel:

	uint32_t vis_w = 1366;
	uint32_t vis_stretch_w = vis_w;
	uint32_t vis_sblank_w = 80;
	uint32_t vis_sync_w = 128;
	uint32_t vis_eblank_w = 200;

	uint32_t vis_h = 768;
	uint32_t vis_stretch_h = vis_h;
	uint32_t vis_sblank_h = 3;
	uint32_t vis_sync_h = 5;
	uint32_t vis_eblank_h = 22;

	uint32_t vis_blank_w = vis_sblank_w + vis_sync_w + vis_eblank_w;
	uint32_t vis_blank_h = vis_sblank_h + vis_sync_h + vis_eblank_h;
	genx_reg32[HTOTAL_n(genx_pipe)] = ((vis_w+vis_blank_w-1)<<16) | (vis_w-1);
	genx_reg32[HBLANK_n(genx_pipe)] = ((vis_w+vis_blank_w-1)<<16) | (vis_w-1);
	genx_reg32[HSYNC_n(genx_pipe)] = ((vis_w+vis_sblank_w+vis_sync_w-1)<<16) | (vis_w+vis_sblank_w-1);
	genx_reg32[VTOTAL_n(genx_pipe)] = ((vis_h+vis_blank_h-1)<<16) | (vis_h-1);
	genx_reg32[VBLANK_n(genx_pipe)] = ((vis_h+vis_blank_h-1)<<16) | (vis_h-1);
	genx_reg32[VSYNC_n(genx_pipe)] = ((vis_h+vis_sblank_h+vis_sync_h-1)<<16) | (vis_h+vis_sblank_h-1);
	genx_reg32[PIPEnSRC(genx_pipe)] = ((vis_stretch_w-1)<<16)|(vis_stretch_h-1);

Getting the ring buffer to work

Tested on:

  • GM45 1366x768 CQ60-210TU
  • HD3000 1366x768 dv6-6c35tx

References for RING_BUFFER_* registers

  • G45: Vol1a, pg 238
  • HD3000: Vol1p3, pg 39

References for commands

  • G45: Vol1b
  • HD3000: Vol1p3-5

General notes

The ring buffer is vital for being able to send commands to the GPE (Geometry Processing Engine) and whatnot. Commands are at least 1 DWord long, but the ring buffer indices are aligned to the nearest QWord.

Note, the tail indicates the end of the buffer, and is where you write your commands to. The head indicates the start of the buffer, and is where the GPU reads from. While the head is DWord-aligned, the tail is QWord-aligned, so you may need to pad your instructions by inserting a MI_NOOP (0x00000000 will do).

It's possible to start the ring buffer and then advance the tail when you have a new command or batch of commands.

RING_BUFFER_START denotes the address relative to that AGP stolen memory space.

RING_BUFFER_HEAD and RING_BUFFER_TAIL need to be given byte offsets, so if you add, say, two DWords, you'd add 8 to RING_BUFFER_TAIL.

Apparently you don't need a GTT to get this working.

If you want to check to see if this is working, NOPID is a useful register. The Gen6 docs seem to be missing the location of this register, however it is in the same location as Gen4.5 (0x02094).

Using the blitter

Tested on:

  • GM45 1366x768 CQ60-210TU

The HD3000 was tested at some stage but the ring buffer stopped, suggesting an invalid instruction, although there may have also been a GTT issue.

General notes

Firstly you'll want to know your "raster op" modes. They are conceptually the same as the Amiga's modes, although probably with different sources.

Here's a list of useful modes:

  • 0xF0: Set to pattern (useful for COLOR_BLT)
  • 0xCC: Set to source image (useful for SRC_COPY_BLT)
  • 0xAA: Set to destination (useful for not much)
  • 0x55: Set to opposite of destination (useful for XOR effect)

Top bit determines the result when all of {pattern, source, destination} are 1. Bottom bit determines the result when all are 0.

The rest is All There In The Manual.

WARNING: When the manual says "pitch in dwords", what they really mean is "pitch in bytes, aligned to a dword boundary".

There are two sets of blitter commands: The XY_ commands, and the other commands. The other commands are a bit simpler, but there are only two of them.

  • COLOR_BLT fills a rectangular area with a solid colour. Useful for clearing the screen like a boss.
  • SRC_COPY_BLT copies from one place in memory to another. If you have no GTT, this will be GPU-to-GPU only. Still faster, easier and more powerful than EGA/VGA.

Here's an example of a 32bpp COLOR_BLT used to clear a screen with a nice purple tinge:

	// COLOR_BLT
	genx_rb_push(0
		| (0x2<<29) | (0x40<<22)
		| (0x3<<20) // a:rgb mask
		| 0x03
	);
	genx_rb_push(0
		| (3<<24) // bit depth
		| (0xF0<<16) // raster op
		| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
	);
	genx_rb_push(((screen_height)<<16)|((screen_width)<<2)); // height in scanlines, width in bytes
	genx_rb_push(screen_agp_offset);
	genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066

The XY_ commands allow you to specify ranges using X,Y coordinate pairs and apply clipping based on those pairs.

Here's the XY_COLOR_BLT version of the above:

	// XY_COLOR_BLT
	genx_rb_push(0
		| (0x2<<29) | (0x50<<22)
		| (3<<20) // a:rgb mask
		| (0<<11) // tiling enable (tile-X only)
		| 0x04
	);
	genx_rb_push(0
		| (0<<30) // clipping enable
		| (3<<24) // bit depth
		| (0xF0<<16) // raster op
		| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
	);
	genx_rb_push((0<<16) | (0)); // Y1:X1 top-left
	genx_rb_push(((screen_height)<<16)|((screen_width))); // Y2:X2 bottom-right
	genx_rb_push(screen_agp_offset);
	genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066

Note, if you want clipping, you'll want to run an XY_SETUP_CLIP_BLT command, and then enable the "clipping enable" flag.

Memory-to-GPU blit

TODO: Get the GTT working first. Only then will this work.

Nevertheless, this is probably faster than a memcpy.

Making the GTT behave

TODO: Succeed. Right now I'm failing. Miserably.

See Also

Articles

External Links