Measure GPU Performance - MATLAB & Simulink Example (2024)

Open Live Script

This example shows how to measure some of the key performance characteristics of a GPU.

GPUs can be used to speed up certain types of computations. However, GPU performance varies widely between different GPU devices. In order to quantify the performance of a GPU, three tests are used:

  • How quickly can data be sent to the GPU or read back from it?

  • How fast can the GPU kernel read and write data?

  • How fast can the GPU perform computations?

After measuring these, the performance of the GPU can be compared to the host CPU. This provides a guide as to how much data or computation is required for the GPU to provide an advantage over the CPU.

Setup

gpu = gpuDevice();fprintf('Using an %s GPU.\n', gpu.Name)
Using an NVIDIA RTX A5000 GPU.
sizeOfDouble = 8; % Each double-precision number needs 8 bytes of storagesizes = power(2, 14:28);

Testing host/GPU bandwidth

The first test estimates how quickly data can be sent to and read from the GPU. Because the GPU is plugged into the PCI bus, this largely depends on how fast the PCI bus is and how many other things are using it. However, there are also some overheads that are included in the measurements, particularly the function call overhead and the array allocation time. Since these are present in any "real world" use of the GPU, it is reasonable to include these.

In the following tests, memory is allocated and data is sent to the GPU using the gpuArray function. Memory is allocated and data is transferred back to host memory using gather.

Note that the GPU used in this test supports PCI Express® version 4.0, which has a theoretical bandwidth of 1.97GB/s per lane. For the 16-lane slots used by NVIDIA® compute cards this gives a theoretical 31.52GB/s.

sendTimes = inf(size(sizes));gatherTimes = inf(size(sizes));for ii=1:numel(sizes) numElements = sizes(ii)/sizeOfDouble; hostData = randi([0 9], numElements, 1); gpuData = randi([0 9], numElements, 1, 'gpuArray'); % Time sending to GPU sendFcn = @() gpuArray(hostData); sendTimes(ii) = gputimeit(sendFcn); % Time gathering back from GPU gatherFcn = @() gather(gpuData); gatherTimes(ii) = gputimeit(gatherFcn);endsendBandwidth = (sizes./sendTimes)/1e9;[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);fprintf('Achieved peak send speed of %g GB/s\n',maxSendBandwidth)
Achieved peak send speed of 9.5407 GB/s
gatherBandwidth = (sizes./gatherTimes)/1e9;[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);fprintf('Achieved peak gather speed of %g GB/s\n',max(gatherBandwidth))
Achieved peak gather speed of 4.1956 GB/s

On the plot below, the peak for each case is circled. With small data set sizes, overheads dominate. With larger amounts of data the PCI bus is the limiting factor.

semilogx(sizes, sendBandwidth, 'b.-', sizes, gatherBandwidth, 'r.-')hold onsemilogx(sizes(maxSendIdx), maxSendBandwidth, 'bo-', 'MarkerSize', 10);semilogx(sizes(maxGatherIdx), maxGatherBandwidth, 'ro-', 'MarkerSize', 10);grid ontitle('Data Transfer Bandwidth')xlabel('Array size (bytes)')ylabel('Transfer speed (GB/s)')legend('Send to GPU', 'Gather from GPU', 'Location', 'NorthWest')hold off

Measure GPU Performance- MATLAB & Simulink Example (1)

Testing memory intensive operations

Many operations do very little computation with each element of an array and are therefore dominated by the time taken to fetch the data from memory or to write it back. Functions such as ones, zeros, nan, true only write their output, whereas functions like transpose, tril both read and write but do no computation. Even simple operators like plus, minus, mtimes do so little computation per element that they are bound only by the memory access speed.

The function plus performs one memory read and one memory write for each floating point operation. It should therefore be limited by memory access speed and provides a good indicator of the speed of a read+write operation.

memoryTimesGPU = inf(size(sizes));for ii=1:numel(sizes) numElements = sizes(ii)/sizeOfDouble; gpuData = randi([0 9], numElements, 1, 'gpuArray'); plusFcn = @() plus(gpuData, 1.0); memoryTimesGPU(ii) = gputimeit(plusFcn);endmemoryBandwidthGPU = 2*(sizes./memoryTimesGPU)/1e9;[maxBWGPU, maxBWIdxGPU] = max(memoryBandwidthGPU);fprintf('Achieved peak read+write speed on the GPU: %g GB/s\n',maxBWGPU)
Achieved peak read+write speed on the GPU: 659.528 GB/s

Now compare it with the same code running on the CPU.

memoryTimesHost = inf(size(sizes));for ii=1:numel(sizes) numElements = sizes(ii)/sizeOfDouble; hostData = randi([0 9], numElements, 1); plusFcn = @() plus(hostData, 1.0); memoryTimesHost(ii) = timeit(plusFcn);endmemoryBandwidthHost = 2*(sizes./memoryTimesHost)/1e9;[maxBWHost, maxBWIdxHost] = max(memoryBandwidthHost);fprintf('Achieved peak read+write speed on the host: %g GB/s\n',maxBWHost)
Achieved peak read+write speed on the host: 71.0434 GB/s
% Plot CPU and GPU results.semilogx(sizes, memoryBandwidthGPU, 'b.-', ... sizes, memoryBandwidthHost, 'r.-')hold onsemilogx(sizes(maxBWIdxGPU), maxBWGPU, 'bo-', 'MarkerSize', 10);semilogx(sizes(maxBWIdxHost), maxBWHost, 'ro-', 'MarkerSize', 10);grid ontitle('Read+write Bandwidth')xlabel('Array size (bytes)')ylabel('Speed (GB/s)')legend('GPU', 'Host', 'Location', 'NorthWest')hold off

Measure GPU Performance- MATLAB & Simulink Example (2)

Comparing this plot with the data-transfer plot above, it is clear that GPUs can typically read from and write to their memory much faster than they can get data from the host. It is therefore important to minimize the number of host-GPU or GPU-host memory transfers. Ideally, programs should transfer the data to the GPU, then do as much with it as possible while on the GPU, and bring it back to the host only when complete. Even better would be to create the data on the GPU to start with.

Testing computationally intensive operations

For operations where the number of floating-point computations performed per element read from or written to memory is high, the memory speed is much less important. In this case the number and speed of the floating-point units is the limiting factor. These operations are said to have high "computational density".

A good test of computational performance is a matrix-matrix multiply. For multiplying two N×N matrices, the total number of floating-point calculations is

FLOPS(N)=2N3-N2.

Two input matrices are read and one resulting matrix is written, for a total of 3N2 elements read or written. This gives a computational density of (2N - 1)/3 FLOP/element. Contrast this with plus as used above, which has a computational density of 1/2 FLOP/element.

sizes = power(2, 12:2:24);N = sqrt(sizes);mmTimesHost = inf(size(sizes));mmTimesGPU = inf(size(sizes));for ii=1:numel(sizes) % First do it on the host A = rand( N(ii), N(ii) ); B = rand( N(ii), N(ii) ); mmTimesHost(ii) = timeit(@() A*B); % Now on the GPU A = gpuArray(A); B = gpuArray(B); mmTimesGPU(ii) = gputimeit(@() A*B);endmmGFlopsHost = (2*N.^3 - N.^2)./mmTimesHost/1e9;[maxGFlopsHost,maxGFlopsHostIdx] = max(mmGFlopsHost);mmGFlopsGPU = (2*N.^3 - N.^2)./mmTimesGPU/1e9;[maxGFlopsGPU,maxGFlopsGPUIdx] = max(mmGFlopsGPU);fprintf(['Achieved peak calculation rates of ', ... '%1.1f GFLOPS (host), %1.1f GFLOPS (GPU)\n'], ... maxGFlopsHost, maxGFlopsGPU)
Achieved peak calculation rates of 354.4 GFLOPS (host), 414.0 GFLOPS (GPU)

Now plot it to see where the peak was achieved.

semilogx(sizes, mmGFlopsGPU, 'b.-', sizes, mmGFlopsHost, 'r.-')hold onsemilogx(sizes(maxGFlopsGPUIdx), maxGFlopsGPU, 'bo-', 'MarkerSize', 10);semilogx(sizes(maxGFlopsHostIdx), maxGFlopsHost, 'ro-', 'MarkerSize', 10);grid ontitle('Double precision matrix-matrix multiply')xlabel('Matrix size (numel)')ylabel('Calculation Rate (GFLOPS)')legend('GPU', 'Host', 'Location', 'NorthWest')hold off

Measure GPU Performance- MATLAB & Simulink Example (3)

Conclusions

These tests reveal some important characteristics of GPU performance:

  • Transfers from host memory to GPU memory and back are relatively slow.

  • A good GPU can read/write its memory much faster than the host CPU can read/write its memory.

  • Given large enough data, GPUs can perform calculations much faster than the host CPU.

It is notable that in each test quite large arrays were required to fully saturate the GPU, whether limited by memory or by computation. GPUs provide the greatest advantage when working with millions of elements at once.

More detailed GPU benchmarks, including comparisons between different GPUs, are available in GPUBench on the MATLAB® Central File Exchange.

See Also

gpuArray | gputimeit

Related Topics

  • Measure and Improve GPU Performance

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

 

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Measure GPU Performance- MATLAB & Simulink Example (4)

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
  • 日本 (日本語)
  • 한국 (한국어)

Contact your local office

Measure GPU Performance
- MATLAB & Simulink Example (2024)
Top Articles
The Complete Guide to Cleaning Up Your Email Inbox
Exempt-Interest Dividend
Exclusive: Baby Alien Fan Bus Leaked - Get the Inside Scoop! - Nick Lachey
East Cocalico Police Department
³µ¿Â«»ÍÀÇ Ã¢½ÃÀÚ À̸¸±¸ ¸íÀÎ, ¹Ì±¹ Ķ¸®Æ÷´Ï¾Æ ÁøÃâ - ¿ù°£ÆÄ¿öÄÚ¸®¾Æ
DEA closing 2 offices in China even as the agency struggles to stem flow of fentanyl chemicals
7.2: Introduction to the Endocrine System
Craigslist Cars And Trucks Buffalo Ny
Wal-Mart 140 Supercenter Products
How to Watch Braves vs. Dodgers: TV Channel & Live Stream - September 15
South Ms Farm Trader
Patrick Bateman Notebook
Gopher Hockey Forum
Icivics The Electoral Process Answer Key
Phoebus uses last-second touchdown to stun Salem for Class 4 football title
Self-Service ATMs: Accessibility, Limits, & Features
Boston Dynamics’ new humanoid moves like no robot you’ve ever seen
Utexas Iot Wifi
Koninklijk Theater Tuschinski
Cars & Trucks - By Owner near Kissimmee, FL - craigslist
No Limit Telegram Channel
Gopher Carts Pensacola Beach
Where to eat: the 50 best restaurants in Freiburg im Breisgau
UAE 2023 F&B Data Insights: Restaurant Population and Traffic Data
Redbox Walmart Near Me
Ridge Culver Wegmans Pharmacy
Rlcraft Toolbelt
Angela Muto Ronnie's Mom
Glossytightsglamour
Agematch Com Member Login
Case Funeral Home Obituaries
Craigslist Boats Eugene Oregon
Raisya Crow on LinkedIn: Breckie Hill Shower Video viral Cucumber Leaks VIDEO Click to watch full…
Myanswers Com Abc Resources
888-333-4026
Discover Things To Do In Lubbock
No Boundaries Pants For Men
Nina Flowers
Sand Castle Parents Guide
Jamesbonchai
Television Archive News Search Service
Pixel Gun 3D Unblocked Games
Spurs Basketball Reference
Tyco Forums
Dicks Mear Me
Enjoy Piggie Pie Crossword Clue
Treatise On Jewelcrafting
Home | General Store and Gas Station | Cressman's General Store | California
Black Adam Showtimes Near Cinemark Texarkana 14
Jovan Pulitzer Telegram
Subdomain Finer
Electronics coupons, offers & promotions | The Los Angeles Times
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6496

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.