Latest Intel Graphics Developer's Guide
Note: The Developer's Guide on this page is for the Intel® Graphics Media Accelerator used in the the Intel® GMA X3000 chipset. Find the latest Intel Graphics Developer's Guide
Introduction
The Intel® GMA X3000 is the fourth generation of Intel Integrated Graphics. As with previous generations, the Intel® GMA X3000 chipset continues to add features that enable consumers to have a wealth of features at much lower cost than discrete graphics solutions. As these integrated solutions become more commonplace in the market, it becomes ever more important for 3D developers to support and target the feature set of Intel’s Integrated Solution. Intel Graphics is designed to meet the display needs for the majority of the business and consumer users. The graphics core is built into the chipset. It shares system memory with the CPU to keep the system architecture balanced at a compelling cost for the customer. Intel’s GMA capabilities match or exceed many more expensive graphics card solutions. PC buyers have appreciated this balanced approach to system design, and Intel Graphics is currently the number one graphics solution chosen by new PC purchasers.
To the best of my knowledge no one has gotten the GMA 3000 working on Leopard or Tiger. I also don't believe any major development efforts are being directed towards this. 'IF' GMA 3000 can at least bring the performance of these cards, it should be a significant improvement that 'could' translate into reduced sales at the low-end graphic card segment for ATI and NVIDIA.
When memory is installed in your Mac in same-size pairs, the computer gains some performance benefits from the memory being interleaved. If you upgrade the memory in your Mac, make sure you have matching SO-DIMMs (both in memory size and speed) in. Best Video Software for the Mac How To Run MacOS High Sierra or Another OS on Your Mac Best Graphic Design Software the Mac Stay Safe with Best Free Password. Intel Graphics Media Accelerator. NVidia Graphics Driver (Windows Vista 64-bit / Windows 7 64-bit / Windows 8 64-bit) Free Update your nVidia graphics processing unit to the latest drivers.
This document describes the Intel® Graphics Media Accelerator (Intel® GMA 3000 and X3000) and provides development hints and tips to ensure that your customers will have a great time playing your games and running other interactive 3D graphics applications. We welcome feedback from the ISV community and our customers. Please give your feedback by going to Forums.
Intel® GMA 3000 and X3000 Overview
Intel® GMA X3000 is the first generation of Intel Integrated Graphics to have the ability to do hardware vertex processing for Shader Model 3.0. Additionally, while previous hardware provided only a subset of DirectX 9.0 features and OpenGL 1.4 compliance, GMA X3000 improved feature set includes a full DirectX9.0C/Ex feature set as well as OpenGL 2.0. The core frequency has also been increased to 667Mhz and the maximum available memory available for the graphics system has been increased to 384MB from 224MB. This enables more textures, larger vertex buffers, and larger models to be stored. Finally, the peak memory bandwidth has been increased by 20% from the previous generation of Intel Integrated Graphics for the desktop to a peak of 12.8 GB/s. GMA X3000 graphics will be able to support both Windows XP* as well as Windows Vista*.
A chart summarizing these improvements:
Gma 3000 Driver For Mac Os
Features | 2005 / 2006 | 2006 / 2007 | 2007 | 2007/2008 | 2007/2008 |
Platform (Desktop/ Mobile) | Intel® 945G / 945GM Chipset | Intel® G965 (X3000) Chipset | Intel® GM965 (X3100) Chipset | Intel® G31/G33 Chipset | Intel® G35 Chipset |
Manufacturing Process | 130 nm | 90 nm | 90 nm | 90 nm | 90 nm |
Scalable Core Frequency** | 400/133, 166, 222, 250 MHz | 667 MHz | 500 MHz | 400 MHz | 667 MHz |
Memory Frequency | Up to 2 Ch DDR 667 | Up to 2 Ch DDR2 800 | Up to 2 Ch DDR2 667 | Up to 2 Ch DDR3 1066 | Up to 2 Ch DDR2 800 |
Peak Memory BW | 10.7 GBps | 12.8 GBps | 10.7 GBps | 17.1 GBps | 12.8 GBps |
Max Video Memory* | 224MB | 384MB | 384MB | 287MB? | 384MB |
DirectX* API Support | DirectX 9.0C/Ex | DirectX*9.0C/Ex | DirectX* 10 | DirectX* 9.0C/Ex | DirectX* 10 |
OpenGL API Support | 1.4 + Extensions | 1.4 + Extensions | 1.5 | 1.4 + Extensions | 2.0 |
Direct*X VA Support | v.1.0 (2.0 in LH) | v.2.0 | v.2.0 | v.2.0 | v.2.0 |
Driver Model Support | XPDM + LDDM | XPDM and LDDM (Basic Scheduler) |
Intel® GMA 3000 and X3000 Features
This section provides information on the key new features of GMA (X)3000 and provides a comparison to prior generation parts. The features are grouped into functional categories with one chart for each area. The final section describes the shared memory architecture used by the GMA (X)3000.
Pixel Shader
<
Pixel Shader | 2005 / 2006 | 2006 / 2007 | 2007 | 2007/2008 | 2007/2008 |
Platform (Desktop/ Mobile) | Intel® 945G / 945GM Chipset | Intel® G965 Chipset (X3000 series) | Intel® GM965 Chipset (X3000 series) | Intel® G31/G33 Chipset (3000 series) | Intel® G35 Chipset (X3000 series) |
Pixel Shader Model | 2.0 | 3.0 | 3.0 | 3.0 | 3.0 |
Shader Precision | 24 bit floating point | 32 bit floating point | 24 bit floating point | 24 bit floating point | 24 bit floating point |
Max Samplers | 8 | 16 | 16 | 8 | 16 |
Max Shader Instructions | 96 | 512 | 512 | 96 | 512 |
Dependent Textures | 4 | 512 | 512 | 4 | 512 |
Dynamic Branching | N | Y | Y | N | Y |
Max Texture Instructions | 32 | 512 | 512 | 32 | 512 |
Texture Sampler
Texture Sampler | 2005 / 2006 | 2006 / 2007 | 2007 | 2007/2008 | 2007/2008 |
Platform (Desktop/ Mobile) | Intel® 945G / 945GM Chipset | Intel® G965 Chipset | Intel® GM965 Chipset | Intel® G31/G33 Chipset | Intel® G35 Chipset |
Compute Precision | 24 bits floating point | 16 and 32 bits floating point | 16 and 32 bits floating point | 24 bits floating point | 16 and 32 bits floating point |
Max 2d texture | 2K x 2K | 4K x 4K | 4K x 4K | 2K x 2K | 4K x 4K |
Max 3d texture | 2048 x 2048 x 256 | 8092 x 8092 x 256 | 8092 x 8092 x 256 | 2048 x 2048 x 256 | 8092 x 8092 x 256 |
Max cube map | 1024 | 1024 | 1024 | 1024 | 1024 |
Max Anisotropy | 4 sub-samples | Up to 16 sub-samples | Up to 16 sub-samples | 4 sub-samples | Up to 16 sub-samples |
Compressed textures | DXT1, DXT3, DXT5 and FXTn | DXT1, DXT3, DXT5 and FXTn | DXT1, DXT3, DXT5 and FXTn | DXT1, DXT3, DXT5 and FXTn | DXT1, DXT3, DXT5 and FXTn |
Non power of 2 texture sizes | Yes | Yes | Yes | Yes | Yes |
Render to texture | Yes | Yes | Yes | Yes | Yes |
Color and Z Buffers
Color and Z Buffers | 2005 / 2006 | 2006 / 2007 | 2007 | 2007/2008 | 2007/2008 |
Platform (Desktop/ Mobile) | Intel® 945G / 945GM Chipset | Intel® G965 Chipset | Intel® GM965 Chipset | Intel® G31/G33 Chipset | Intel® G35 Chipset |
Compute Precision | 24 bit floating point | 32 bit floating point | 32 bit floating point | 24 bit floating point | 32 bit floating point |
Max 2d texture | 2K x 2K | 4K x 4K | 4K x 4K | 2K x 2K | 4K x 4K |
Max 3d texture | 128x128x128 | 128x128x128 | 128x128x128 | 128x128x128 | 128x128x128 |
Max cube map | 1024 | 1024 | 1024 | 1024 | 1024 |
Max Anisotropy | 4 sub-samples | Up to 16 sub-samples | Up to 16 sub-samples | 4 sub-samples | Up to 16 sub-samples |
Compressed textures | DXT1,DXT3,DXT5 and FXTn | DXT1,DXT2-5 and FXTn | DXT1,DXT2-5 and FXTn | DXT1,DXT3,DXT5 and FXTn | DXT1,DXT2-5 and FXTn |
Non power of 2 texture sizes | Yes | Yes | Yes | Yes | Yes |
Render to texture | Yes | Yes | Yes | Yes | Yes |
Vertex Shader
Vertex Shader | 2005 / 2006 | 2006 / 2007 | 2007 | 2007/2008 | 2007/2008 |
Platform (Desktop/ Mobile) | Intel® 945G / 945GM Chipset | Intel® G965 Chipset | Intel® GM965 Chipset | Intel® G31/G33 Chipset | Intel® G35 Chipset |
Vertex Texture | SW | HW | HW | SW | HW |
Instancing | SW | HW | HW | SW | HW |
Dynamic Flow | SW | HW | HW | SW | HW |
Vertex Shader Model | 3.0 in SW | 3.0 in HW | 3.0 in HW | 3.0 in SW | 3.0 in HW |
Dynamic Video Memory
Intel Graphics utilize a shared memory architecture (often referred to as a unified memory architecture or UMA) – system memory is used for both graphics and system purposes. Instead of using dedicated local memory, as is the case on the majority of discrete graphics cards today, a portion of the system memory is allocated to be used as video memory. Additionally, a small amount of system memory is permanently allocated to video memory by the BIOS. This amount is usually one or eight megabytes (most OEMs set eight megabytes). Systems shipping with a Windows Vista* logo are required to set eight megabytes of pre-allocated memory.
Dynamic Video Management Technology (DVMT) allows additional system memory to be dynamically allocated for graphics usages based o n application need. Once the application is closed, the memory that was allocated is released and is then available for system use. The purpose of dynamically allocating memory for graphics use is to ensure a solid balance between system performance and graphics performance. For example, if a user is simply editing text, there would be no need for the graphics to take up a large amount of the system’s memory. In such a case, it would be best if more memory was allocated to the system. On the other hand, if the user was to start up a 3D game, there would be a need for more of the shared memory to be used as graphics memory.
Windows XP*
On boot-up the system’s BIOS code determines the amount of system memory to be permanently used by the graphics controller and once selected, this memory will never be given back to the system. Some systems allow end users to adjust this value via BIOS setup options. Once the operating system is started the graphics driver will then dynamically allocate graphics memory based on requests from each application run by the user. For systems with more than 256 MB or more memory a maximum of 128 MB will be allocated for use by the graphics controller (memory set aside by the BIOS + memory dynamically allocated by the driver). In addition to this, a new BIOS setting can provide 128 MB of fixed memory or even 384 MB of memory for systems with 512 or more megabytes of memory.
Windows Vista*
On systems running Windows Vista* the graphics driver will ignore the system BIOS settings for memory allocation due to differing operating system requirements from Windows XP. Instead the graphics driver will allocate a combination of fixed and dynamic memory based on the amount of system memory detected. This enables BIOS vendors to create a unified system BIOS for Windows XP* and Vista systems. A certain amount of fixed memory is required so that the driver can ensure a mode change from the current mode to any supported mode. For systems with 512 megabytes of system memory (minimum requirement for Windows Vista*) the driver will allocate 32 megabytes of fixed video memory (8MB from pre-allocated) and up to 32 megabytes of dynamic video memory for a total of 64 megabytes of video memory. If the system has more than 768 megabytes of memory then the graphics driver will allocate 64 megabytes of fixed video memory and the rest will be allocated dynamically depending on how much system memory there is. The chart below shows the specific memory allocation.
System Memory | Fixed | Dynamic | Total |
512MB | 32MB | 32MB | 64MB |
768MB – 1023MB | 64MB | 64MB | 128MB |
1024MB – 1525MB | 64MB | 192MB | 384MB |
>=1536MB (G965,GM965,G31,G33,G35) | 64MB | 320MB | 384MB |
Intel® GMA X3000 Architecture
The GMA X3000 is designed from a new architecture which improves upon previous Intel graphics architectures through use of a fully programmable and scalable array of execution units. This scalable design allows the number of execution units to be easily increased as manufacturing capabilities improve without a major architecture change resulting in a consistent and stable platform that evolves to higher performance levels over time.
The GMA X3000 architecture marks a departure from the zone rendering architecture. While the diagram below shows that the overall system architecture remains largely unchanged, a great number of graphics improvements have been made to overcome the benefits once provided by zone rendering. The list includes a larger graphics aperture size, an increase in core clock frequency to 667MHz and an increase in the system bus speed 1066MHz.
Figure 2. Chipset layout including GMCH, ICH and main memory.
Further performance increases over zone rendering are achieved through the high level of programmability of the GMA X3000 shader execution architecture. Due to this and the ability to execute both vertex and pixel shader programs the architecture is often referred to as a “unified shader architecture”. The advantage is that the amount of processing power applied to vertices and pixels can be dynamically balanced according to the needs of a particular frame of an application. Architectures lacking unified shaders may leave vertex shaders idle while the pixel shaders are overloaded on frames that contain large triangles. Conversely, frames that contain many small triangles tend to result in idle pixels shaders while the vertex shaders are overloaded. The unified approach assigns execution units to vertices or pixels as needed and thus minimizes idle execution units and provides a better price performance ratio because theoretically the end user avoids paying for silicon that is idle much of the time.
Figure 3. G965, GM965 and G35 (X3000) Pipelines
Pipeline Stages
Pipeline Stage | Functions Pe rformed |
Command Stream | This stage is responsible for managing the 3D pipeline and passing commands down the 3D pipeline. Additionally it reads “constant data” from memory buffers and places it into Chip Memory. The Command Stream stage is shared between the 3D and Media pipelines. |
Vertex Fetch | This stage is responsible for reading vertex data from Chip Memory, reformatting it, and writing the results into new vertex entries in the Chip Memory. |
Vertex Shader | The Vertex Shader stage is responsible for processing (shading) incoming vertices by passing them to vertex shader threads. Each thread corresponds to a kernel program that performs one or more of the following operations: |
Clip Unit | The functions of this stage are performed in two parts. Initially the fixed function portion of the Clip Unit is responsible for categorizing the input primitive into one of three states: |
Strip/Fan Unit | The functions of this stage are performed in two parts. Initially the fixed function portion of the Strip/Fan Unit is responsible for: |
Windower/Masker | The functions of this stage are performed in two parts. Initially the fixed function portion of the Windower/Masker Unit performs primitive rasterization. It then spawns a pixel shader thread to shade the primitive’s pixels. |
Sampler Unit | This unit provides the capability of advanced sampling and filtering of t exture surfaces in memory. It performs the following functions: |
Gma 3000 Driver For Macbook
Business SKU vs. Consumer SKU
Because different users have different graphics needs, this generation of graphics solutions comes in a variety of SKUs. Business users typically do not run 3D graphics intensive applications and therefore may not require the same capabilities as someone who plays videogames.
SKU | GMA 3000 | GMA X3000 |
Number of Execution Units | 8 EUs (667MHz) | 8 EUs (667MHz) |
Memory Channels | 2 | 2 |
DDR2/ECC Support | DDR2-800 | DDR2-800 |
Vertex Shader | CPU | GPU |
Early Z | Yes | Improved |
Dependent Textures | Yes | Improved |
Occlusion Query | No | Yes |
DX9 Shader Model (SM) | SM2.0 | SM3.0 |
Floating RT/Blend | No | Yes |
High Definition Playback | MP2 | MP2 / VC1 |
Performance Tips When Working With GMA X3000
GMA X3000 tends to be fill rate limited when compared to other much more expensive solutions. The simplest way to handle this is by giving users the option of selecting lower resolutions and fewer fill rate intensive effects such as shadows. Better solutions that improve the user experience are detailed below.
TIP: Provide users with options that reduce fill rate requirements.
WHY: Gen4 tends to be fill rate limited when compared to other much more expensive solutions.
HOW: The simplest way to handle this is by giving users the option of selecting lower resolutions and fewer fill rate intensive effects such as shadows. Better solutions that improve the user experience are detailed below.
TIP: Favor advanced single-pass shaders over simple multi-pass shaders
WHY: GMA X3000 tends to be more fill rate limited than compute limited. This means that when given the choice of using a simpler shader with multiple passes or a more advanced shader model with a single pass always favor the advanced shader with a single pass. GMA X3000 natively supports advanced shader models such as SM3.0. It executes them efficiently while it tends to be fill rate limited when doing multiple passes with simple shaders.
TIP: Leverage Early Z feature to reduce fill requirements
WHY: The GMA X3000 hardware can perform a Z test on a pixel before it is sent to the pixel shader or the render target; this feature is referred to as “Early Z”. If the fragment fails the Z test it can be immediately discarded thus eliminating any additional texture or frame buffer accesses.
HOW: Early Z automatically works for you whenever possible. To maximize the benefit of Early Z you should avoid manipulation of the Z buffer via pixel shaders when a standard Z test would work. Obviously if no Z test is performed before running the pixel shader you can not avoid running the pixel shader via early Z. Another common method of boosting early Z performance is to render the scene from front to back. If this can be done, it effectively reduces the depth complexity to one and thus saves substantial fill rate. However the benefits of this approach need to be balanced against the additional state changes and texture cache misses that may be incurred by forcing a total front to back ordering. Because of this it is often better to only group and sort objects that share common textures and other rendering states. Keep in mind that front to back rendering and early Z offer little benefit when depth complexity is low and or when using very simple single texture shaders.
TIP: Use Gen4’s Occlusion Query
WHY: GMA X3000 supports the D3D occlusion query capability, which can be used to reduce overdraw.
HOW: see IDirect3DDevice9::CreateQuery(). This query capability can be used to count the number of pixels that passed the Z test. With this you can determine if an object is potentially visible by rende ring its bounding box. If the query returns zero then the bounding box was not visible so there is no need to render the object. This can provide a very large performance boost when it is used on complex objects such as trees that contain hundreds of branches and leaves. The technique should not be used on simple objects since the bounding box test could be more expensive than simply rendering the object. When rendering the bounding box be sure to turn off both Z writes and color writes.
TIP: Consider Z only pass followed by color pass
WHY: When using long complex pixel shaders it’s important to minimize the number of pixels rendered so that total shader compute time stays within reason.
HOW: One way of doing this is to do an initial Z only pass (meaning no color buffer writes or pixel shader execution) for all objects in the scene. Then do a second pass with shading turned on. The early Z test will then eliminate work on all non visible pixels. This approach should not be used with simple single texture shaders since the cost of two passes can easily exceed the relatively low cost of rendering with a simple shader.
TIP: Combine the above methods for maximum benefit
WHY: These methods can work together to produce greater fill rate savings.
HOW: The Z only pass can be combined with front to back sorting to reduce Z fill requirements. Since this is a Z only pass you do not need to be concerned about state change overhead induced by sorting. Adding occlusion query with the bounding box test to the Z only pass for complex objects will further improve results. When objects are determined to be not visible in the Z pass be sure to flag them so that they can be skipped in the final color pass as well.
TIP: Reduce memory bandwidth
WHY: This will increase application performance and interactivity.
HOW: The following list describes a few methods to reduce memory bandwidth - use compressed textures, use D3DPOOL_MANAGED or D3DPOOL_DEFAULT when allocating surface, buffer or texture memory, reduce texture size or quality, use level-of-detail, reduce the content footprint by employing efficient culling algorithms.
Programming for the GMA X3000
Creating a DX9 Device and Identifying GMA X3000
Following is a code snippet that shows one way to initialize the GMA X3000 device. It shows how users can check the presence of Hardware T&L feature and turn it ON if it is available. Alternatively the sample shows how to switch to Software vertex processing for legacy integrated graphics hardware.
Device Capabilities
Listing all of the device capabilities supported by any piece of graphics hardware is a very large undertaking. The easiest way to examine a particular graphics device capability is to use the DirectX Caps Viewer, available after the DX SDK is installed on a system. This provides detailed information for each capability of the graphics card when running a DirectX driver. When using this mode, be sure to select the appropriate adapter format that represents the mode in which the graphics driver is running. For example, D3DFMT_X8R8G8B8, full screen, windowed, and so on.
Figure 4. A snapshot of the DirectX Caps viewer. On the left is a list of capability topics. On the right are the detailed caps bits reported by this hardware.
Checking for Available Memory
A check that is often performed before actually executing the application is the amount of available free graphics or video memory. As a result of the dynamic allocation of graphics memory performed by the Intel Integrated Graphics devices (based on application requests), you need to take care in ensuring that you understand all of the memory that is truly available to the graphics device. Memory checks that only supply the amount of 'local' graphics memory available do not supply an appropriate value for the Intel Integrated Graphics devices. To accurately detect the amount of memory available to the Intel Integrated Graphics devices, check the total video memory availability. All video memory on GMA X3000, even the dynamically allocated DVMT memory, is considered to be “Local Memory”.
“Non-Local Video Memory” will show as ZERO (0). This should not be used to determine “AGP” or “PCI Express” compatibility.
The code snippet below outlines the function calls necessary to most accurately check the memory available for use by the graphics controller within DirectX 9:
High Level Shading Languages and GMA X3000
GMA 3000 supports DirectX 9.0C/Ex. It supports the version of HLSL compiler that ships along with this DirectX9.0C/Ex distribution.
Troubleshooting GMA X3000 Issues
There are several tools available by Microsoft and other vendors to assis t in troubleshooting X3000 issues. Microsoft’s DirectX* debug runtime will send extremely useful output to your debugger. In many cases, it will even give suggestions to improve performance.
How to File a Suspected Driver Bug or Feature Request
Video-miniport, display, and 3D acceleration drivers are constantly evolving, with new features and performance improvements. Because of the complexity of these drivers, bugs will crop up from time to time. Similarly, the graphics architecture team is making decisions on what features to include in the next product, and your feedback as a customer is of tremendous value.
It is important to note that many bugs that developers believe to be driver bugs turn out to be configuration defects in their application and would occur on any hardware with similar capability. For example, if a bug appears only on Intel hardware, the problem may be an application defect that is only exposed when utilizing software vertex processing. Forcing third-party video cards to use capabilities similar to Intel devices will often expose application defects.
Should you have a feature request or a bug report to file, your first contact should be your company’s main Intel Corporation contact, typically known as a Strategic Relations Manager (SRM) or Developer Relations Manager (DRM). If your company does not have one, it is very likely that your publisher does. This should be the same person you turn to for CPU information and optimizations.
Your contact is part of a team dedicated to Software Solutions for Intel products. When they have enough investigation to suspect the problem is caused by Intel hardware or software (drivers), they will work with the Intel® Desktop Platforms Group ISV Enabling team, which will debug the problem.
Web site and Engineering support
Software developers can go to the forum at Intel® Integrated Graphics Forum and post questions/comments about the complete line of Intel’s Integrated Graphics chipset solutions.
If you are a game programmer, many useful documents including everything from multithreading to audio, are available at http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/games/index.htm .
Miscellaneous
Chipset Device IDs
Chipset | Device ID 0 | Device ID 1 |
Intel® 915G | 2582 | 2782 |
Intel® 915GM | 2592 | 2792 |
Intel® 945G | 2772 | 2776 |
Intel® 945GM | 27A2 | 27A6 |
Intel® G965 | 29A2 | 29A3 |
Intel® GM965 | 2A02 | 2A03 |
Intel® 946GZ | 2972 | 2973 |
Intel® Q965/Q963 | 2992 | 2993 |
Intel® Q35 | 29B2 | 29B3 |
Intel® G33/G31 | 29C2 | 29C3 |
Intel® Q33 | 29D2 | 29D3 |
Note: Device IDs are in Hexadecimal format.