UDN

Search public documentation:

GameThreadProfilingHome

Interested in the Unreal Engine?
Visit the Unreal Technology site.

Looking for jobs and company info?
Check out the Epic games site.

Questions about support via UDN?
Contact the UDN Staff

UE3 Home > Performance, Profiling, and Optimization > Game Thread Profiling and Optimization

Game Thread Profiling and Optimization

Overview

Every gameplay object that is added to the scene takes up some resource; and usually the most interesting objects take up the most resources. In order to have enough GameThread CPU time for those objects you need to make certain that other objects are not unfairly utilizing resources and that all objects do the minimal amount of work needed to accomplish their goal.

Gameplay Profiler

Gameplay Profiler can provide lots of great information about expensive functions in your script code. It's extremely fast to capture profiling data and will give you instant access to hot spots over a short sampling duration.

To capture script profiler data:

Type PROFILEGAME START in the console to start capturing data.
When you're done, type PROFILEGAME STOP to stop capturing and save the data to disk.
Profiling data will be spit out to \[UnrealInstallation]\[GameName]\Profiling\ folder.
- On consoles, the stats data will automatically be transferred to the PC through UnrealConsole.
Run GameplayProfiler and load up the file (e.g. [GameName]-07.15-18.58.uprof).

See the Gameplay Profiler documentation for more a detailed overview and additional information.

NOTE: In many cases the Stats Viewer tool can actually provide more detailed script profiling data than the GameplayProfiler, at the cost of more runtime overhead. Just make sure to #define STAT SLOW to 1 when capturing stats data, and StatsViewer will display full UnrealScript call graphs for frames!

Stats Viewer

You can use StatsViewer to help track down CPU performance problems. It also serves as a detailed profiling tool for UnrealScript code!

StatsViewer can display all UnrealScript function calls and game stats (value counters, cycle timers, etc) along a graph timeline where you can sort and view data, similar to how you would in PIX. More importantly, it can show you a nice hierarchical call graph of scoped cycle stats and script functions. This lets you quickly see "what's slow" for any given frame! (Just double click the frame in the graph window.)

Preparation:

Make sure STATS is defined to 1 in the build you're using (UnBuild.h).
- This is enabled by default in Debug and Release builds.
To profile script code, also make sure STATS_SLOW is defined to 1 (UnStats.h).
- This slows down profiling, but provides detailed call graph data for all UnrealScript calls.

To capture stat data to disk (recommended):

Type STAT StartFile in the console to start capturing stats to disk.
When you're done, type STAT StopFile to stop logging and finalize the stats file.
To start capturing stats immediately on app startup, pass the -StartStatsFile argument to the command line.
On consoles, make sure to pass the -DisableHDDCache command-line option!
- This turns off caching of texture mips to the HDD (which contests with stat file writing).
Stat files will be spit out to [UnrealInstallation]\[GameName]\Profiling\ folder.
- On consoles, the stats data will automatically be transferred to the PC through UnrealConsole.
Run StatsViewer and load up the file (e.g. [GameName]-07.15-18.58.ustats).

To capture live stats from a running game:

Load up the game.
Connect the StatsViewer to the Xenon session using Connect to IP.
- On Xenon, use the Xenon's "Title IP Address", not the "Debug Channel IP Address".
  - You can find this in UFE by clicking "Show All Target Information".
- Make sure the port number is set to 13002.
Live stats will start streaming in through a UDP connection!
You can save the captured data out to disk using File > Save.

Viewing stats data:

Load up a .ustats file in StatsViewer tool, or Connect to a live game session.
The interactive graph will show you frame times initially so you can see hitches and trending.
Drag and drop stats from the left-hand column onto the graph to display the stat data.
Click in the graph to select a frame and view stats data for that frame.
Double-click in the graph to open the Call Graph for that frame!
Right click on stats in the left-hand column to "View Frames by Criteria" (e.g. only frames with FPS < 20, etc.).
Use the menu options to switch viewing modes (frame #s versus time, ranged/overall data).

Other notes:

Doesn't work in LTCG modes (unless you #define STATS locally.) Use Release.
Live stat capture is still somewhat buggy (drops frames, scrolling is a bit weird).

For more information about real-time Stats capture, see the Performance Tracking System.

Level Profiling and Optimization

Dynamic Light Environment Updating

When profiling a level, it is very possible that Dynamic light environment updates might show up as expensive. When such a situtation is encountered, it is time to take a look at what Actors in the level are using DLEs and what the settings on those are.

All InterpActors (aka Movers) and KActors have a dynamic light environment by default. When a DLE updates however, it does line checks to light sources, which can add quite a lot of CPU cost. However, there are various options you can set that reduce the cost of lighting on dynamic objects.

Here are the light environment settings from most expensive to least (on the game thread) and when they should be used:

1) bEnabled=True, bDynamic=True (the default)

These should only be used where needed, they will update based on InvisibleUpdateTime and MinTimeBetweenFullUpdates. There probably shouldn't be more than 50 of these active at any given time. They do extra visibility checks when visible, close to a player or when they are moving.

2) bEnabled=True, bDynamic=False, bForceNonCompositeDynamicLights=True

These should be very cheap, the environment is updated on the first tick and never again. bForceNonCompositeDynamicLights is necessary to allow dynamic lights to affect them, which doesn't have any significant game thread overhead. There can be hundreds of these, the only cost (after the first tick) will be line checks to dynamic lights (and only when the owner is visible). These look better than using precomputed shadows because they can rotate and the lighting will still be correct. They are used by fractured meshes, GDO's, and some other things. If you guys see significant cost with these then it can probably be optimized quite a bit on the code side.

3) bEnabled=False, bUsePrecomputedShadows=True (on the primitive component). Also, you'll have to take it out of the dynamic channel and put it in the static lighting channel.

These will be lightmapped, very cheap to render and should have virtually no game thread overhead (except that the UDynamicLightEnvironmentComponent::Tick function is still called). They will look wrong when moved.

There is a console command that can be used to see how many light environments are active. Type SHOWLIGHTENVS at the console and you will get a list of all environments that got ticked that frame. The output should look like:

Log: LE: SP_MyMap_01_S.TheWorld:PersistentLevel.InterpActor_12.DynamicLightEnvironmentComponent_231 1
Log: LE: SP_MyMap_01_S.TheWorld:PersistentLevel.InterpActor_55.DynamicLightEnvironmentComponent_232 0
Log: LE: SP_MyMap_01_S.TheWorld:PersistentLevel.InterpActor_14.DynamicLightEnvironmentComponent_432 1 ...

A '1' at the end of the line means bDynamic is set to TRUE. This should be used to guide level designers to change their map to reduce update overhead.

Loading Performance

Sometimes you are seeing issues with levels taking a long time to stream in. Usually some specific object is the cause of the long stream in times. We just need to find that object and see why!

There are a number of defines in the engine that deal with loading/streaming:

TRACK_SERIALIZATION_PERFORMANCE
TRACK_DETAILED_ASYNC_STATS
TRACK_FILEIO_STATS

Turn those all on and then load the level and you will get a bunch of stats in the log file that will show you which objects are taking the most time.

Particles

There are usually lots of particles in a scene and not all of them have the same costs. We would like to know if there are any particles that are significantly more costly than others. Additionally, we want to know if there are any particles doing work when they don't need to be.

Setting #define TRACK_DETAILED_PARTICLE_TICK_STATS 1 will out put a number of particle tick stats to the log

Additionally, you will wan to look for particles that are updating their bounds every frame. The ContentAudit will have tagged those ParticleSystems, but if you ever see Particles having their bounds being dynamically updated each frame, those are always a good candidate to have a FixedRelativeBoundingBox

Generally, smoke / splash effects that occlude your view of the level make up the majority of slow particle systems. These cost more than hit effects and sparks because they are large on the screen. The single most effective way to optimize these is to use fewer particles that are more opaque. Simplifying the material can have some small gains too, but nothing like reducing the number of layers.

Distortion has quite a bit of constant overhead in each DPG that it is used in. When you make an effect like a screen damage or blood effect that is placed in the foreground DPG (either by you or gameplay code), it adds this constant overhead to the foreground DPG as well as the world DPG. You can avoid this by doing the refraction effect using a scene texture lookup in the material. Using scene texture will change the sort order, but that won't matter for an effect in the foreground DPG. It will also not handle overlapping effects, but that's rarely noticeable for screen effects.

Physics

Physics simulation of objects via the PhysX physics engine during play can add a great deal of realism and visual polish to your game, but it also has the potential to affect performance. In order to monitor and analyze the physics simulation, Unreal Engine 3 provides some built-in features to visualize and output information. Third party tools are also provided by nVidia which enable very detailed profiling capabilities.

Check out PhysX Profiling Home for how to use the PhysX profiling tools

Some other useful console commands to help you see what physics is doing:

LISTAWAKEBODIES: When you are standing still and nothing is moving around and Physics time is sky high, there might be a body that has not gone to sleep.
PHYSASSETBOUNDS: Where are the bounds for the phys assets in the scene
nxvis collision: good general command to see what the physics collision scene looks like

Script Profiling

Log Messages

Lots of information is sent to the log. Over time the amount of debug logging can become overwhelming. Once that occurs real scary issues can be lost amongst the spam. To combat this we want to keep the Log basically empty and when we see issues being logged we want to fix them immediately.

Anytime you see a log warning/message, find out why it is occurring and fix the underlying issue. Don't let them pile up.

NOTE: when adding log warning / debug messages PLEASE PLEASE PLEASE add:

How to fix / where to look to fix the issue
What object is emitting the log

With those two added data points it makes it almost trivial to fix things!

An example of a log message you might encounter and the fix for it is:

Cooking Convex For __: You will see that message if someone has not set the PreCachedPhysScale. This is causing the game to have to create convex hulls for that object. That is slow and can be done ahead of time and should be

Garbage Collection

UnrealEngine3 uses GarbageCollection for part of its memory management strategy. We want to minimize the cost of doing the actual GC. We can do that by minimizing the amount of objects we iterate over and the amount of objects we are constantly spawning and GCing.

See Optimizing GC Performance for detailed information.

Also check out: DETAILED_PER_CLASS_GC_STATS in the code. Turning this on will show you the class of the objects that the GarbageCollector is having to look at and GC. When you see a specific type of object you should ask: "can I cache that object?". Caching it in a pool or on the Actor spawning it might be better than constantly spawning it only to have it GC'd

Per-Frame Expensive Updates

Every frame objects are being updated / computed. Some of those objects take a huge amount of time more than others. This could be due to a large number of attachments, having incorrect collision settings, or inefficient tick() code.

In UnActorComponent.cpp, there are a large number of defines which determine what we collect.

We need to turn the following "on"

LOG_DETAILED_COMPONENT_UPDATE_STATS
LOG_DETAILED_ACTOR_UPDATE_STATS

Once those are on you should get a huge amount of data in the logs. The objects at the top of the list are the more expensive ones and should be looked at first.

Additionally, if you have LOTS of small costing objects that are not near the player you should look at trying to find ways to put them into the TickableActors List or some other way to put them to sleep when the player is not around.

Slow UnrealScript Function Calls

Games should be using UnrealScript to quickly prototype functionality. And then once that is up and running profile and see which functions are crazy slow. For those functions we will then either optimize the unrealscript or convert to C++. This provides a quick way to play through the level and have all of the slow ones be logged to a file.

Additionally, this allows you to easily look for functions that are really slow that may happen only a few times but can help cause a hitchy frame.

Turn this define on: SHOW_SLOW_UNREALSCRIPT_FUNCTION_CALLS

Additionally there is SHOW_SLOW_UNREALSCRIPT_FUNCTION_CALLS_TAKING_LONG_TIME_AMOUNT that determines how "slow" the function call needs to be before it is logged.

Spawning Is Slow

There are a lot of things that occur when an object is spawned / ConstructObject into the world. When AI's are flood spawned into the world that can cause massive hitches. So we want to reduce that amount.

The engine (on console) will alert you to the fact if you are spawning multiple pawns a frame.

But you should keep in the back of your mind to not just spawn a huge amount of things at once.

Death is a great example of this. When a pawn dies there is usually:

Gore Mesh
Gibs
Playing sounds
Blood particles
Camera Particles
Decals
Death particles
Bullet impacts (from the bullets that caused the death)
Bullet impact sounds

That is a lot of stuff! If you were to spawn it all in one frame that would probably cause a hitch. Delaying a frame or spawning some of those over a number of frames more than likely will not be noticeable and will not cause the engine to hitch.

UnrealScript PreProcessor Code Exclusion

In UnrealScript, code can be excluded from shipping builds by using the `if(`notdefined(FINAL_RELEASE)) preprocessor command. This can help to keep extraneous code from being included and executed.

Use TickableActors List

Many times you will have a large level with lots of actors in the level that have some sort of logic that is done in Tick(). But when the player is not near them doing that logic is not important. So we need some way to not ever tick those actors so we are not wasting cpu cycles.

Basically to utilize the TickableActors you need to need to be able to get an event from to "turn back on" the object. So Foliage is a great example, do not tick them unless the player has just brushed up against them.

To move the Actor in and out of the TickableActors list you use: SetTickIsDisabled(TRUE/FALSE);

NOTE: For things that don't have a nice "event" way to do things you will need to create a manager or some other mechanism to determine when the "asleep" objects should be "woken up"

AInteractiveFoliageActor is a good example of an actor that uses this system

Useful Exec Commands

Basically, there are LOTS of nice exec commands which do cool things for perf and memory. The issue is that they are normally buried in the specific code that deals with that subsystem. Or they are listed on the page that deals with that subsystem. So it pretty hard to find them all.

Additionally, not all of them have a standard naming convention.

We try to list out some of the more useful ones here.

Basically, most exec commands need to have a #define turned on somewhere. So it often best to take the exec command and then FindInFiles for it and see what other things need to be active for it to work.

Also, not all commends output to the log. Some will create files, some will create data in memory that then needs a "Dump" command before it is logged.

FindInFiles is your friend!

UnPlayer.cpp Exec() function has a large number of exec commands UnLevTic.cpp has a large number of G______ vars that are toggled on and off to allow certain logging

SHOWSKELCOMPTICKTIME
SHOWLIGHTENVS
SHOWISOVERLAPPING
LISTAWAKEBODIES
TOGGLECROWDS
MOVEACTORTIMES
PHYSASSETBOUNDS
FRAMECOMPUPDATES
FRAMEOFPAIN
SHOWSKELCOMPLODS
SHOWSKELMESHLODS
SHOWFACEFXBONES
SHOWFACEFXDEBUG
LISTSKELMESHES
LISTPAWNCOMPONENTS
TOGGLELINECHECKS / DUMPLINECHECKS / RESETLINECHECKS

Line Checks

Problem:

Too much time is being spent performing Line Checks (calls to SingleLineCheck or MultiLineCheck.)

Solutions:

1. If you can trace back to the caller of the function which you know of and is obvious, then you should optimize that function to not call as often as it does.

2. If you can't trace back to the caller of the function because the function is used by almost every actor (i.e. MoveActor calling LineCheck or performPhysics calling, at the end, LineCheck), you need to know which actor that was.

You can enable tracing of line checks using the following commands: TOGGLELINECHECKS, TOGGLELINECHECKSPIKES, DUMPLINECHECKS, RESETLINECHECKS

These only work in PC, but this helps to identify who the callers were of the function. This activates capturing all calls to linecheck and what the callstack was and who the callers were. If you'd like to flush, use DUMPLINECHECKS. The file is stored [Gamename]/Logs/ as csv file.

After line check tracing is enabled, you'll see the following log message coming up:

Log: Line tracing is now enabled. Log: Stack tracking is now enabled. Log: Script stack tracking is now enabled.

Below is the example of the file, and as you can see, GearPawn_COGMarcus_0 called the function 7042 times over 1006 frames. You can find how many calls were made by which actor via which function. Also, the log will show NonZeroExtent trace or not. In this way, you can isolate the issues and bottlenecks to the call and optimize.

Log: Log file open    08/07/08 13:45:23
Log: Captured 74 unique callstacks totalling 13375 function calls over 1006 frames    averaging 13.30 calls/frame
Log:    7042
         UWorld::MultiLineCheck() 0xa0c718   + 39 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevact.cpp:2037]
         UGearGameplayCamera::PreventCameraPenetration() 0x1880c6b  + 0 bytes [File=d:\code\unrealengine3\development\src\geargame\src\gearcamera.cpp:769]
         UGearGameplayCamera::PlayerUpdateCameraNative() 0x188f913  + 0 bytes [File=d:\code\unrealengine3\development\src\geargame\src\gearcamera.cpp:1324]
         UGearGameplayCamera::execPlayerUpdateCameraNative() 0x18922de  + 21 bytes [File=d:\code\unrealengine3\development\src\geargame\inc\geargamecameraclasses.h:401]
         UObject::CallFunction() 0x5f9a86   + 0 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:5633]
         UObject::execFinalFunction() 0x5fbdac   + 28 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:1645]
         UObject::ProcessInternal() 0x5f637e   + 30 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:5862]
         UObject::CallFunction() 0x5f9d15   + 0 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:5803]
         UObject::execVirtualFunction() 0x5fbd7f   + 45 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:1638]
         UObject::execContext() 0x5efa27   + 0 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:1628]
         UObject::ProcessInternal() 0x5f637e   + 30 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:5862]
         UObject::CallFunction() 0x5f9d15   + 0 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:5803]
         UObject::execVirtualFunction() 0x5fbd7f   + 45 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:1638]
         UObject::ProcessInternal() 0x5f637e   + 30 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:5862]
         UObject::ProcessEvent() 0x602baf   + 0 bytes [File=d:\code\unrealengine3\development\src\core\src\uncorsc.cpp:6012]
         AActor::ProcessEvent() 0x80d293   + 22 bytes [File=d:\code\unrealengine3\development\src\engine\src\unactor.cpp:1320]
         UWorld::Tick() 0xa16256   + 59 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevtic.cpp:2998]
   ....

Log:
         NonZeroExtent
         GearPawn_COGMarcus_0 (7042) : No_Detailed_Info_Specified

Log:    358
         UWorld::MultiLineCheck() 0xa0c718   + 39 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevact.cpp:2037]
         UWorld::MoveActor() 0xa0fc72   + 0 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevact.cpp:1416]
         APawn::physWalking() 0xaa620d   + 0 bytes [File=d:\code\unrealengine3\development\src\engine\src\unphysic.cpp:838]
         APawn::startNewPhysics() 0xaa5943   + 0 bytes [File=d:\code\unrealengine3\development\src\engine\src\unphysic.cpp:469]
         APawn::performPhysics() 0xa92eb2   + 0 bytes [File=d:\code\unrealengine3\development\src\engine\src\unphysic.cpp:409]
         AGearPawn::performPhysics() 0x188a6b4  + 0 bytes [File=d:\code\unrealengine3\development\src\geargame\src\geargame.cpp:5406]
         AActor::TickAuthoritative() 0xa00467   + 20 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevtic.cpp:721]
         AActor::Tick() 0xa00571   + 16 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevtic.cpp:966]
         TickActors<FDeferredTickList::FGlobalActorIterator>() 0xa0e7fc   + 0 bytes [File=d:\code\unrealengine3\development\src\engine\src\unlevtic.cpp:2329]
         UWorld::Tick() 0xa15708   + 37 bytes [File=d:\code\unrealengine3\development\src\engine
   ....

It also captures script traces as well, but script trace does not provide actor information yet. If you'd like to reset (clear all captures), RESETLINECHECKS. Please make sure that you RESETLINECHECKS after you turn it off to clear buffer.

TOGGLELINECHECKSPIKES takes an argument of the number of line checks in any given frame to set off a dump of the line check information. For example TOGGLELINECHECKSPIKES 50 will only dump statistics in those frames where 50 line checks or more were performed. This is useful for automating the process and only capturing data in specific frames where thresholds are met.

3. Suggestions

Cache: Cache the value for every # of frames and use that if not critical to have the result right at that frame.
Frame control: Only handle # of line checks in a given frame. If it's over the number, please do it in the next frame.
Bake: Look for bake solution to reduce run-time calls. If you can bake the information to node or actor, so that you don't have to call in run-time.

Player/AI Profiling

AI Logging

AIs do a lot of interesting things some of which can be computationally expensive. Utilizing AILogging we can get a view into what they were doing and why they might have gotten stuck or in some loop that is doing expensive actions.

In your DefaultAI.ini under each AIController class set bAILogging=TRUE if you want AILogging.

This will output a log file in your log directory with all of the `AILog() that have been used in your code.

Move Actor

When calls to MoveActor are showing up high in the inclusive time, there are several reasons for MoveActor showing up as expensive in your game, but they fall into 3 main categories:

There are too many MoveActor calls being performed
The settings on the Actors being moved are causing the function to be expensive
The Actor being moved has many other Actors attached

There are 3 parts of MoveActor function itself that generally contribute to its slowness:

An non-zero extent line-check (aka. swept box check) is performed along the movement of the Actor. This is done for physics modes like PHYS_Walking.
An encroachment overlap check is performed at the new location of the Actor. This is done for movers and vehicles for example.
To a lesser extent, updating the collision data structure when the Actor moves.

There is some logging you can enable that will show you all MoveActor calls executed in a frame. First define MOVEACTOR_STATS to 1 at the top of UnLevAct.cpp. Additionally, there is SHOW_MOVEACTOR_TAKING_LONG_TIME which will can be less spammy and just show specific actors that are taking a long time.

Then in the game type MOVEACTORTIMES during a period when MoveActor times are high. You will get some logging like this:

   Log: MOVE - GearPawn_COGRedShirt_2 0.037ms 1 0
   Log: MOVE - GearPointOfInterest_5 0.002ms 0 0
   Log: MOVE - SkeletalMeshActor_28 0.019ms 0 0
   Log: MOVE - Emitter_32 0.003ms 0 0
   Log: MOVE - Emitter_33 0.002ms 0 0
   Log: MOVE - Emitter_35 0.001ms 0 0
   Log: MOVE - Emitter_47 0.002ms 0 0

Each line is a call to MoveActor that happened during the frame. Indented lines indicate that the call was a result of the Base moving. So when SkeletalMeshActor_28 moved, that caused MoveActor to be called on all the Emitters attached to it. The timing numbers are inclusive - so they include the MoveActor times of attached things. The first number (1 or 0) indicates if the extent line check was performed as part of the move, to see if the actor hit a wall or a trigger. The second number indicates if an encroachment check was performed as part of the move. Note that physics modes such as PHYS_Walking can result in multiple MoveActor calls in one frame - this is expected.

We will now take a look at the 3 slow parts of MoveActor, and how to avoid doing them.

The extent line check during a move will not be performed if any of the following are true:

The actor is considered an 'encroacher' (i.e. in PHYS_RigidBody or PHYS_Interpolating, and bCollideActors is TRUE)
bCollideActors and bCollideWorld are FALSE
There is no CollisionComponent

An Actor only really needs to perform the encroachment point-check in the following instances:

It needs to push other unreal-physics Actors around (e.g. walking Pawns)
You need to know when the Actor hits a trigger
You need the PhysicsVolume of the Actor to be updated

If you don't need any of these things (e.g. an effect class attached to a character), you can set the bNoEncroachCheck flag on the Actor to TRUE.

Setting bCollideActors to FALSE will stop the Actor being added or updated in the collision octree. This will slightly speed up MoveActor times, but of course will mean the Actor cannot be hit by line checks.

NavMesh Performance

The NavigationMesh can be used for a large number of spatial queries. Having a large number of constraints or conversely not enough Constraints which is causing LOTS of polys to be looked at can be slow.

Additionally, it can be modified at runtime which can be slow. If you are doing a lot of obstacle mesh creation each frame that can cause the higher than desired frametimes.

Turning this define on UnNavigationMesh.cpp PERF_NAVMESH_TIMES will output where the time is going inside the navmesh

Skeletal Mesh Component Upadtes

If calls to SkeletalMeshComponent::Tick() are showing up high in the inclusive time, you need to examine the cause for this. Because of the complexity of animation on many characters in UE3 games, the animation system is often one of the more expensive parts of the engine on CPU. It is important though to know exactly where that time is going, to optimize it as much as possible. There are some logging tools that allow you to see what it going on.

First set the SHOW_SKELETAL_MESH_COMPONENT_TICK_TIME define to 1. Then in the game type SHOWSKELCOMPTICKTIME in the console. It will log all SkeletalMeshComponents ticked in a frame, along with how long they took to tick, broken down into sections. The first line is column headers, so you can paste this info into a spreadsheet for more analysis.

_Log: SkelMeshComp: Name SkelMeshName Owner TickTotal UpdatePoseTotal TickNodesTotal UpdateTransformTotal UpdateRBTotal_
_Log: SkelMeshComp: map_01.TheWorld:PersistentLevel.Pawn_Big_Ogre_0.SkeletalMeshComponent_6 OgreMeshes.Big_Ogre Pawn_Big_Ogre_0 0.006564 0.003211 0.000000 0.000769 0.000000_
_Log: SkelMeshComp: map_01.TheWorld:PersistentLevel.Weap_Crossbow_1.SkeletalMeshComponent_7 CrossbowMeshes.WoodenCrossbow Pawn_Big_Ogre_0 0.008103 0.000000 0.005938 0.000000 0.000000_

SkelMeshName	The name of the mesh used by this SkeletalMeshComponent.
Owner	Name of the Actor to which this SkeletalMeshComponent is attached.
TickTotal	The total time if took to call Tick on this component.
UpdatePoseTotal	This time includes all animation blending (GetBoneAtoms), controller evaluation and pose matrix building.
TickNodesTotal	This is how long was spent calling TickAnimNode on all the AnimNodes
UpdateTransformTotal	How long was taken to update the component (updating its transform, attached Actors, and sending info to rendering thread). This time is in addition to the Tick time.
UpdateRBTotal	If the SkeletalMeshComponent has a physics-engine representation (PhysicsAssetInstance), this is the time take to update it based on the animation results.

The first thing to check is that you are not updating animations on meshes that are not required. Here are some methods for improving this:

Use streaming levels to only load skeletal mesh instances when required.
Hide actors using skeletal meshes when not needed. SkeletalMeshActors are put in 'stasis' when hidden, which causes them not to be ticked at all.
Set bUpdateSkelWhenNotRendered to FALSE if possible. Note that this may cause problems if you rely on animation notifies or root motion for gameplay, as characters off the screen will stop animating. This option defaults to FALSE for SkeletalMeshActors.

Once you have ensured that you are only updating animation for those Actors that are necessary, there are some things you can do to optimize the animation update itself.

Avoid too many 'partial blends'. When a blend node has one input at 100%, it is very cheap on the CPU, because data is simply 'passed through'. However, when multiple inputs are blended, this uses a lot more CPU.
Keep trees small if possible to avoid ticking and blending too many nodes. Change the animations on one node using code, rather than adding a node for every animation.
Set bSkipTickWhenZeroWeight on nodes where possible. This stops nodes being ticked when they are not relevant (ie. zero weight in the final animation blend). This also causes blend nodes to blend to 100% immediately when not relevant, thus avoiding 'partial blends'.
If you cannot set bUpdateSkelWhenNotRendered to FALSE, set bIgnoreWhenNotRendered on specific SkelControls where possible.
Set bForceLocalSpaceBlend to TRUE on AnimNodeBlendPerBones where possible, as this uses less CPU.

Other skeletal optimization tips:

Reduce number of meshes you have to draw. For example in Gears enemies don't display weapon attachments on their backs.
Reduce number of Bones, use mesh LODs to reduce complexity at a distance.
Set bForceRefpose=TRUE on SkeletalMeshComponents that don't need to be animated. (Weapons in Gears use this when they're used as attachments, or as pickups).
Use UAnimTree::SetUseSavedPose() to freeze animation and reduce AnimTree to a single node when character dies and goes to ragdoll.
AnimNodeSequence nodes with bSkipTickWhenNotRelevant, will not extract animation when the mesh is not rendered (they will use a cached frame, unless root motion needs to be extracted). This reduces Animation Extraction cost when meshes are not visible.
AnimNodeBlendLists which have bSkipBlendWhenNotRendered=TRUE and mesh is not rendered, then blends will be skipped. This saves on blending cost, transitions are done instantly.
If you can share branches within the tree, do so. For instance you have several branches that can potentially use a common path (say for example an AimOffset node, and a looping animation). Do not duplicate this, but have the tree use that same branch. Branches are only evaluated once and cached, so in that example the work will be done only once and can be used in multiple branches.
A tree can be large and contain many nodes, but for performance you don't want to reduce the number of nodes being ticked each frame (use bSkipTickWhenZeroWeight to reduce that number), and design tree so you don't have too many animations extracted and blended at once.
Use AnimNodeSlots to play animations on demand from script, as opposed to having every single animation played existing as a node in the tree. AnimNodeSlots work well for one shot actions. Gears uses these for weapon reloads, weapon switching, all special moves (mantles, swat turn, cover slip, evades), ladder interactions, chainsaw duels, cringes, hit reactions, death animations, button/lever interactions, etc.
If given two animation, you can't blend from one to another, it means that they don't need to both exist in that same tree. The design philosophy of Gears Marcus' AnimTree if that movement animations and controls (walking, running, idle, cover leaning, pop up) exist in the tree, as well as aiming. Any one time action, such as what is mentioned above, is played on demand through the use of AnimNodeSlots. This greatly reduces the complexity of the AnimTree, and yet allows to play a lot of different animations.