UDN
Search public documentation:

PerformanceDebugging
日本語訳
中国翻译
한국어

Interested in the Unreal Engine?
Visit the Unreal Technology site.

Looking for jobs and company info?
Check out the Epic games site.

Questions about support via UDN?
Contact the UDN Staff

Performance Debugging

Summary: How to debug frame rate problems and hitching in the Unreal Engine.

Document Changelog: Created and maintained by Mike Fricker.

Game Performance Overview

The following sections will help you understand the most useful tools for tracking down performance problems, bottlenecks and hitches.

Usually the first order of business is determining whether your frame is limited by CPU or GPU performance, and then using the appropriate tools to narrow down the problem.

If the CPU is running slowly there are many commands and debugging tools available to help with this. Sometimes there are simply too many actors and other "moving parts" causing a frame to be slow, or maybe a renegade AI class is firing thousands of ray casts per frame. With some patience you can track these perf issues right down to the individual level actor!

For GPU performance you may need to use PIX to analyze draw events or shader performance, as well as in-game visualization modes and HUD stats. Sometimes these problems can be traced to content layout or lighting properties.

In either case you may be dealing with sustained frame rate problem or a hitch (large frame time spike.) For sustained poor frame rate, single-frame capture tools (such as the TRACE command) and sampling profilers (VTune, etc) may be very useful. For hitches, you'll want to make use of the StatsViewer tool to analyze historical frames using a call graph.

Note that consoles will generally have different performance debugging tools than PC, although many of the Unreal tools will work on multiple platforms.

Debugging Performance: Step by step

Preparing to debug:

  1. First, make sure to always have STAT UNIT up while you're running the game
  2. Configure your build environment. Use ShippingDebugConsole (LTCG) builds when measuring performance, and Release builds when debugging performance
  3. Make sure to turn off any log spam or debug code that will taint performance results
  4. Turn off garbage collection verification because it will contribute to hitching
  5. Compile scripts in Final Release mode
  6. Disable VSync so you can identify bottlenecks, even on fast frames

Isolating the type of bottleneck:

  • Make use of STAT UNIT to determine why the frame is slow
    • You should be able to isolate whether the issue is a CPU or GPU problem

CPU - Tracking down slow C++ or UnrealScript code:

  • Turn on STAT SLOW to help identify hitches on the CPU
    • If you see hitches in ScriptTime when using STAT SLOW, then consider using StatsViewer or Script Profiling to drill down into the script timing data in gameplay code.
  • Capture profiling data for C++ code using platform CPU sampling tools.
  • Capture UnrealScript calls and Stat-based timing data for use with StatsViewer
    • This is great for tracking down hitches and script performance bugs!
  • Use Script Profiling to find hot spots in UnrealScript code.

GPU - Determining cause of poor rendering performance:

  • Use GPU profiling tools to locate slow draw calls, expensive shaders and abusive resource handling.

Guides to solving typical Unreal Engine performance problems:

Making sure that level content and assets are optimized:

  • Often, performance problems will need to be solved by altering or optimizing levels or assets.
  • See the content optimization section to learn about the tools available to help.

Performance Reference

Displaying Frame/Thread/GPU time (STAT UNIT)

Use the STAT UNIT command to determine where your frame time bottleneck is. STAT UNIT will toggle an on-screen HUD that displays Frame time, Game thread time, Render thread (Draw) time and GPU time (if possible.) This is an invaluable first step for tracking down performance problems -- you should pretty much always have this turned on.

Also, you an use the STAT FPS command to display frame rate and frame time on the screen.

Build configuration

You'll most certainly want to use either the Release or ShippingDebugConsole (LTCG) build configuration while testing performance. ShippingDebugConsole provides performance numbers close to that of a shipping title, however certain performance metrics and debugging features (such as Stats capturing) may not be available. With some minor tweaking, Release builds are satisfactory for performance and memory testing, as long as you're looking at results relatively.

Note that if you'll be testing with a Release build, you should turn off any extraneous logging by either suppressing it with commands or #ifdef'ing it out at compilation time. For example, "suppress AILog" might turn off game-specific AI logging that would slow down C++ and script routines.

Turn off garbage collection verification

While working on performance you should always have GC Verification turned off, otherwise you can expect massive hitching in Release builds at least every 30 seconds or so. You can do this using one of the following methods:

  • In UnObjGC.cpp, make sure that VERIFY_DISREGARD_GC_ASSUMPTIONS is 0.
  • Pass the -NoVerifyGC command-line argument.

Compile scripts in Final Release mode

UnrealScript performance can be drastically different when logging and assertions are enabled. The easiest way to get performance that's closest to a shipping title is to compile scripts in Final Release mode, which automatically strips these expensive calls from the compiled byte code. You can build Final Release scripts through one of the following methods:

  • In Unreal Frontend, select the Cooking tab and enable the "Cook Final Release Scripts" checkbox. Now, when you run the cooker or launch the game through UFE, Final Release scripts will be compiled and used.
  • Pass the -Final_Release command-line parameter to the make commandlet.

Disabling VSync

For best results you may want to turn off VSync (waits for vertical retrace before presenting frames.) Otherwise the frame time will be padded to the game's refresh rate, which can make it more difficult to get an accurate picture. To turn off VSync:

  • Pass the -NoVSync command-line option to the game.
  • Or, enable the "No VSync" checkbox in the Game tab of Unreal Frontend.

Using STAT SLOW

You can use the STAT SLOW command to help find performance spikes. It can help you narrow down hitches by reporting any cycle stats that run longer than a specific duration in a frame (10 ms by default.) Stats that run slowly will be displayed on the HUD for a little while, making it easier to correlate the spike with game behavior on screen.

To use, enter STAT SLOW in the console with the optional arguments what the threshold is in seconds (so 0.01 for 10 ms) and how long to render the stat once it has spiked once. The default is 10 seconds.

Example: STAT SLOW 0.01 10

This will render all cycle stats that have been > 10 ms in the last 10 seconds.

Profiling the CPU

After you've isolated your frame bottleneck as a CPU performance issue, one of next steps should be to determine where in your C++ code the time is being spent. Additionally, you may want to look for hidden performance costs such as Load-Hit-Stores or cache misses.

Most platforms have robust CPU sampling tools available that can provide detailed code metrics and even call graphs for C++ code using data captured from a live session. For example, on the Windows platform you might use Intel VTune to capture sampling data over a short period to isolate hot spots. Console platforms have similar tools available, often included with the development tool chain.

You'll want to familiarize yourself with these applications and use them extensively throughout development!

Note that on some consoles, Unreal has some utility functions to help performance single frame CPU/GPU sampling captures. You can type TRACE GAME to initiate a CPU trace, or TRACE RENDER to capture GPU data.

Most CPU profilers support capturing sampling data and/or call graph data over a short duration. This is great for debugging performance issues that involve a sustained poor frame rate. Sampling captures will usually give you function call hot spots in the C++ code. This can be useful sometimes, but more often than not you'll find that a Call Graph capture is much more useful!

Call graphs often have more capture-time overhead, but provide detailed information about the callers of hot spot functions, which can often lead you straight to the source of a problem! Even in cases where many objects/actors are calling a single function that shows up as slow, if you're lucky the call graph will lead you back to a particular actor class name or other pattern that hints you to the cause.

Sometimes call graphs will simply show UnrealScript taking up time (calls to ProcessEvent, CallFunction, etc), in which case you'll want to run a StatsViewer or Script Profiling session capture to drill down into those calls.

When dealing with frame hitches, CPU profilers can sometimes be unwieldy due to relatively low sampling frequency. If the hitch is easily reproducible in game, a good strategy is to initiate the hitch as frequently as possible (ideally many times per frame) while capturing a short CPU profiling session. If the hitch is more elusive, you'll want to use capture data for use in StatsViewer which will provide historical call graph data over many frames.

Profiling code using StatsViewer

You can use StatsViewer to help track down CPU performance problems. It also serves as a detailed profiling tool for UnrealScript code!

StatsViewer can display all UnrealScript function calls and game stats (value counters, cycle timers, etc) along a graph timeline where you can sort and view data, similar to how you would in PIX. More importantly, it can show you a nice hierarchical call graph of scoped cycle stats and script functions. This lets you quickly see "what's slow" for any given frame! (Just double click the frame in the graph window.)

Preparation:

  • Make sure STATS is defined to 1 in the build you're using (UnBuild.h)
    • This is enabled by default in Debug and Release builds
  • To profile script code, also make sure STATS_SLOW is defined to 1 (UnStats.h)
    • This slows down profiling, but provides detailed call graph data for all UnrealScript calls

To capture stat data to disk (recommended):

  • Type "STAT StartFile" in the console to start capturing stats to disk
  • When you're done, type "STAT StopFile" to stop logging and finalize the stats file
  • To start capturing stats immediately on app startup, pass the "-StartStatsFile" parameter
  • On consoles, make sure to pass the -DisableHDDCache command-line option!
    • This turns off caching of texture mips to the HDD (which contests with stat file writing)
  • Stat files will be spit out to \UnrealEngine3\\Profiling\ folder.
    • On consoles, the stats data will automatically be transferred to the PC through UnrealConsole
  • Run StatsViewer and load up the file (e.g. -07.15-18.58.ustats)

To capture live stats from a running game:

  • Load up the game
  • Connect the StatsViewer to the Xenon session using Connect to IP
    • On Xenon, use the Xenon's "Title IP Address", not the "Debug Channel IP Address"
      • You can find this in UFE by clicking "Show All Target Information"
    • Make sure the port number is set to 13002.
  • Live stats will start streaming in through a UDP connection!
  • You can save the captured data out to disk using File -> Save

Viewing stats data:

  • Load up a .ustats file in StatsViewer tool, or Connect to a live game session.
  • The interactive graph will show you frame times initially so you can see hitches and trending
  • Drag and drop stats from the left-hand column onto the graph to display the stat data
  • Click in the graph to select a frame and view stats data for that frame
  • Double-click in the graph to open the Call Graph for that frame!
  • Right click on stats in the left-hand column to "View Frames by Criteria" (e.g. only frames with FPS < 20, etc.)
  • Use the menu options to switch viewing modes (frame #s versus time, ranged/overall data)

Other notes:

  • Doesn't work in LTCG modes (unless you #define STATS locally.) Use Release.
  • Live stat capture is still somewhat buggy (drops frames, scrolling is a bit weird.)

For more information about real-time Stats capture, see this page.

   • GameplayProfiler: This is an overview on how to read and analyze the data from time spent in gameplay code.

Using Script Profiling to tune UnrealScript performance

Gameplay Profiler can provide lots of great information about expensive functions in your script code. It's extremely fast to capture profiling data and will give you instant access to hot spots over a short sampling duration.

To capture script profiler data:

  • Type PROFILEGAME START in the console to start capturing data
  • When you're done, type PROFILEGAME STOP to stop capturing and save the data to disk
  • Profiling data will be spit out to \UnrealEngine3\\Profiling\ folder.
    • On consoles, the stats data will automatically be transferred to the PC through UnrealConsole
  • Run GameplayProfiler and load up the file (e.g. -07.15-18.58.uprof)

See the Gameplay Profiler documentation for more info.

NOTE: In many cases the StatsViewer tool can actually provide more detailed script profiling data than the GameplayProfiler, at the cost of more runtime overhead. Just make sure to #define STAT SLOW to 1 when capturing stats data, and StatsViewer will display full UnrealScript call graphs for frames!

Performance on consoles

Console-specific performance info may need to be on sub-pages.

For a list of all Stat Commands, please see the Stats Descriptions page.

For a list of all Console Commands, please see the Console Commands page.