OTAPI

Beneath the hood of RAD Studio

Improved Rate-Limited Game Loop

30th September 2014
by Simon J Stuart
4 Comments

Game Engine Architecture – Game Loop vs Parallelism (Part 1)

Introduction

When it comes to developing a game engine (whether it be reusable or the engine powering a single specific game) there are many alternative approaches that can be taken at almost every single stage of the design.

For this reason (and I say this from personal experience) the single most critical part of the development process is the designing of the overall architecture, before a single line of code is ever written. If you don’t decide upon an overall architecture before your development begins, you will almost-certainly find yourself scrapping the codebase and starting over each time you realize that there’s a better way your engine could have been designed.

So, in this series of articles, I’m going to share with you the various “architectural rethinks” that have occurred during the development of my newest game engine (called “AGE” which stands for “Another Game Engine”).

The single most significant of these decisions is covered in this first part of the series.

It is my hope that you can benefit from my otherwise-wasted time, and skip the series of poor decisions I made along the way; particularly those which necessitated a complete ground-up redevelopment.

I would like to point out that there will not be any usable code provided in this series of articles. This is because the principals discussed apply universally (to any object-oriented programming language). Some generic code examples may be used where appropriate, but they cannot be assembled into a functional game engine as they will merely be individual, unrelated snips of illustrative code (as opposed to segments of a working example). These code samples will be written in Pascal, as it is my favourite programming language, and is easily translated into other languages.

Flowcharts will be used extensively to illustrate the relationships between subsystems, and individual logical operations.

Please be sure to read the whole article before you begin your engine design. The reason will become apparent as you read further!

What is a “Game Loop”?

Put simply, the “Game Loop” advances the Physics, Logic (including AI) and Rendering. Engines utilizing the Game Loop approach place the Game Loop at the absolute heart of their architecture.

Game Loops, as with any approach to just about anything when it comes to game development, have advantages and disadvantages.

On the plus-side, a Game Loop can be laughably simple to implement. There are several different kinds of Game Loop, but they all operate on a principal of Differential Time (or Delta Time) which is literally the amount of time it took for the previous Loop to execute. Any Game Loop that doesn’t take Delta Time into consideration is doomed to complete and total failure, as there is no way to ensure that the Physics, Logic/AI, and Rendering progress at a consistent rate (resulting in very poor gameplay, and making multiplayer an absolute impossibility).

A simple Game Loop itself can be implemented in mere minutes (or less).

A Simple Game Loop

Simple Game Loop

Simple Game Loop

Note that the order of execution is not arbitrary. Since Physics dictates the Position and Angle of world objects, and the Logic and AI need the latest information to provide accurate behaviour, Physics needs to be updated first. Again, since the Logic and AI could very well have an influence over the rendered appearance of the game world (such as the current Sprite state and Particle Effects), we need update Logic and AI second. Rendering therefore must be performed last.

The Game Loop illustrated above operates at an unfixed rate (so, as fast as the hardware possibly can). Of course, the advantage is that this Game Loop is laughably easy to produce:

var
  LLastTime, LDeltaTime: Double;
begin
  LLastTime := GetReferenceTime;
  while GameRunning do
  begin
    LDeltaTime := GetReferenceTime - LLastTime;
    UpdatePhysics(LDeltaTime);
    UpdateLogic(LDeltaTime);
    RenderFrame(LDeltaTime);
    LLastTime := GetReferenceTime;
  end;
end;

GetReferenceTime returns the current time with an extremely high precision, adjusted for the frequency of the CPU. Different programming languages and standardized libraries will provide their own counterpart to do the same thing.

GameRunning is an external boolean value, function or flag to dictate whether the Loop should continue or break.

One disadvantage to this particular Game Loop is that, while it accounts for the time differential (Delta Time) between each cycle, each cycle will run at the slowest possible speed. This is because each cycle is being made to update all three Simulation components (Physics, Logic and Rendering).

Rendering needs to occur (at minimum) at the same rate as that at which the screen is refreshed (Refresh Rate or Vertical Sync Rate). Typically this is 60 times each second for the vast majority of computer monitors (120 times a second for 3D monitors). Ideally, you want to Render at double the refresh rate of the monitor. This will provide the smoothest possible appearance, and any framerate greater than this will simply not be visible to a player (meaning you’re wasting cycles, reducing performance unnecessarily, and unduly burdening the player’s hardware).

Physics and Logic, on the other hand, need only occur at a suitable base rate. For the majority of 2D games, and even some 3D games, 30 updates per second is usually very adequate. Modern 3D First-Person Shooter [FPS] games ideally want to update the Physics and Logic 60 times every second.

So, what we can do to improve this Game Loop is modify it to fix the rate of the Physics and Logic updates. This will free up more cycles for Rendering.

A Rate-Limited Game Loop

Okay, so the most common implementation of a Rate-Limited Game Loop enforces a limit on both the Physics and Logic, leaving Rendering unlimited. This way, more CPU time is given for Rendering, without necessarily impeding the performance of the Physics and Logic processing.

Rate-Limited Game Loop

Rate-Limited Game Loop

Here’s what this looks like in code:

const
  PHYSICS_RATE_LIMIT = 1 / 30.00;
var
  LLastTime, LDeltaTime: Double;
  LPhysicsUpdateTime: Double;
begin
  LLastTime := GetReferenceTime;
  LPhysicsUpdateTime := LLastTime; // We want the first Physics and Logic update to occur immediately
  while GameRunning do
  begin
    LDeltaTime := GetReferenceTime - LLastTime;
    if GetReferenceTime >= LPhysicsUpdateTime then
    begin
      UpdatePhysics(LDeltaTime);
      UpdateLogic(LDeltaTime);
      LPhysicsUpdateTime := GetReferenceTime + PHYSICS_RATE_LIMIT;
    end;
    RenderFrame(LDeltaTime);
    LLastTime := GetReferenceTime;
 end;
end;

Okay, this code snip (above) adds a simple Rate Limiter to both the Physics and Logic. Now each time the Loop cycles, it will check whether the current Reference Time is equal to or later than the indicated time for the next Physics and Logic update (LPhysicsUpdateTime).

Not so difficult, right? Okay, but this still leaves potential for significant problems! For one thing, the Rate Limit can only be an advantage if the time it takes to Render a frame does not exceed that limit.

An “Improved” Rate-Limited Game Loop

One way we can further improve the previous Game Loop example would be to introduce a Rate-Limit on Rendering.

Rate-Limiting the Rendering is a good idea because it ensures that our game isn’t forcing the player’s hardware to work harder than it needs to. Since we need only render at an absolute maximum rate of double the refresh rate of the player’s monitor, it makes absolutely no sense to allow the Game Loop to Render at any higher rate than that.

However, since different monitors have different refresh rates, we should not consider the rate limit for Rendering to be a fixed (constant) number, and should instead make it a setting (or “property”) that can be specified either automatically by having the game engine retrieve the refresh rate of the monitor on initialization, or by allowing the player to specify their own rate limit in the game’s “Advanced Graphics Options” menu.

Improved Rate-Limited Game Loop

Improved Rate-Limited Game Loop

This diagram (above) illustrates the logical progression of this Game Loop. Everything in yellow occurs on every Tick, everything in green only occurs if a logical operation evaluates as True, and everything in red occurs only if a logical operation evaluates as False.

Here’s what this game loop could look like in code:

const
  PHYSICS_RATE_LIMIT = 1 / 30.00;
var
  LLastTime, LDeltaTime: Double;
  LPhysicsUpdateTime, LRenderUpdateTime: Double;
begin
  LLastTime := GetReferenceTime;
  LPhysicsUpdateTime := LLastTime; // We want the first Physics and Logic update to occur immediately
  LRenderUpdateTime := LLastTime; // We want the first Rendering update to occur immediately too
  while GameRunning do
  begin
    LDeltaTime := GetReferenceTime - LLastTime;
    if GetReferenceTime >= LPhysicsUpdateTime then
    begin
      UpdatePhysics(LDeltaTime);
      UpdateLogic(LDeltaTime);
      LPhysicsUpdateTime := GetReferenceTime + PHYSICS_RATE_LIMIT;
    end;
    if ((FPSLimit > 0) and (GetReferenceTime >= LRenderUpdateTime)) or (FPSLimit = 0) then
    begin
      RenderFrame(LDeltaTime);
      LRenderUpdateTime := GetReferenceTime + FPSLimit;
    end;
   LLastTime := GetReferenceTime;
 end;
end;

FPSLimit represents the variable or property setting specifying how many frames per second the player desires. This implementation also accounts for the possibility of an unlimited rate for Rendering by checking whether the FPSLimit is set to zero.

There are countless other ways you can arguably-improve on the Rate Limited Game Loop, however…

The reason why Game Loops are a really bad idea, regardless of their complexity.

Regardless of how sophisticated your Game Loop becomes, you simply cannot escape one critically-limiting and fundamental flaw: By using a Game Loop, you’re limiting all three processes to a single CPU core.

This means that your engine completely ignores the other cores available on the device, and these days, even relatively inexpensive mobile devices have at least two CPU cores, most Desktop systems now have 4, and there are even 6 and 8 core CPUs already on the market… my main development system has 16 cores [2x 8 core CPUs].

This is why I have since chosen to reject the Game Loop entirely, which pretty-much renders this entire section pointless. Well, it would be pointless were it not for the lessons you’ve hopefully learned from reading it.

A Parallel Game Engine

So, we’ve discussed the most common (somewhat lazy) method of moving a game engine along by way of a Game Loop, and we now know why that method is a bad idea. Let’s take a look at a much better approach, shall we?

Parallel Processing (Phase 1)

Parallel Processing (Phase 1)

Rather than having a single processing Thread looping through the three separate processes, let’s divide the processes up onto separate Threads.

This diagram (above) illustrates the point, with Physics and Logic rolled into one Thread, and Rendering on another.

The reason we bundle Physics and Logic together is that they are fundamentally two parts of the same process. Since we only ever need to update the game’s Logic when the Physical State of the game advances (such as when two objects collide, or cease colliding, and of course when World Object Positions and/or Angles change), it makes all the sense in the world to bundle them together on a single Thread. I name this combined Thread the “Simulation Thread”.

Rendering, on the other hand, has to be performed regardless of the Game State. For example, we need to render the Title Screens when the game launches, the Main Menu before you begin playing the game etc.

Have you spotted a problem yet?

If we’re Simulating on one Thread, and Rendering on another, how do we ensure we’re Rendering a consistent Simulation State?

Rendering a consistent Simulation State

Any competent game engine should employ a comprehensive Event Engine in order to facilitate communication between modules. Not only modules, in fact, as the Event Engine should also handle communications between Game Entities within the game itself.

Rather than me explaining once again what an Event Engine is and how it works, if you don’t already know, you can read my series of articles on Event-Driven, Asynchronous Development with Delphi and the LKSL.

Anyway, a good game engine will employ an Event Engine of some sort, and it is through the use of this Event Engine that we can ensure that our game engine is always rendering a consistent Simulation State.

Simulation and Rendering communication

Simulation and Rendering communication

This flowchart (above) shows how the Simulation Thread communicates with the Rendering Thread using our Event Engine.

It is critical that our Simulation Thread never communicates directly with the Rendering Thread, as we cannot assume that the binary contains the Rendering Thread. This is because the “dedicated server” platform is compiled from the same codebase as the game itself, only the Rendering Thread is not compiled into the “dedicated server” executable.

So, by using the Event Engine to handle communication between Threads, we eliminate any potential conflict between the game’s executable and the server’s.

Now, once we’ve progressed the Physics and the Logic on a Tick in the Simulation Thread, we assemble an Event containing all of the Game Entities’ current render data (position, angle, linear and angular velocity) and dispatch that Event through the Event Engine. This Event also includes the Reference Time at which that render data was assembled (that’s actually very important information)

The Event Engine then passes the event along to the Rendering Thread, where those values will be used in the next Tick.

This way, the Rendering Thread is always operating on a complete (and consistent) set of data.

The Rendering Thread can then interpolate and extrapolate (respectively) based on the last complete set of data (and the Reference Time at which that data was assembled) where on the screen to display each Game Entity, and at what angle.

Since we’re now using two separate Threads, we have given our game engine the ability to use two separate CPU cores, which means both processes can occur at different rates, entirely asynchronously, and we can smoothly display the Simulation on the screen.

Note that it is also the best practise to set realistic Rate Limits on each respective processing Thread so that you yield “spare time” for other processes.

The only downside to this method is that it is architecturally more complex than the Render Loop method… however, this one downside (which I personally consider to be minor) is massively outweighed by its significant advantages.

In conclusion…

What we have seen here is just one case in which there are many alternative approaches you can take when designing and developing your game engine. While the Game Loop approach might seem like the quickest option, and that might strike you as preferable in the beginning, you will eventually encounter one or more of the many shortcomings of that approach, and end up wishing you’d opted for the more complicated – but ultimately superior – approach from the start.

It is my personal opinion that the Game Loop is an architectural concept we should finally retire from the world of game development, and that it should be viewed as inherently flawed. Design your game engine to be Event-Driven and fully Asynchronous by the use of separate Threads. It may add a few hours to your development time, but this investment pays off in the long-run.

Thank you!

I hope you have learned something from this article. The point of this series is to educate others based on my own experiences, and in so doing “justify” the time I would otherwise have wasted through the flawed design principals I investigated when developing my game engine.

While I’m aware that there aren’t many of out there designing game engines these days, there are parallels you may recognize in other types of development project… and the lessons game development teaches us have relevance throughout the world of programming.

As always, questions are always welcomed. If you would like clarification on any of the subject-matter raised in this article, please leave a comment and I shall try to answer as best I can.

Thank you for taking the time to read this article.

My FireMonkey Icon Design

28th September 2014
by Simon J Stuart
5 Comments

Making Windows Icons for RAD Studio Applications

Following a recent discussion with Steve Maughan in which he was having difficulties with the Windows icon for his Delphi Applications, I’ve decided to assemble this tutorial on making proper Windows icons for your Applications.

This tutorial will actually show you how to produce “multi-resolution icons”, covering all of the sizes used by the Windows operating system (more precisely, the Explorer Shell).

Even if you’re already producing multi-resolution icons for your Applications, this guide may still be of benefit to you. You may find that the produced ICO files are of a lower overall file size with no loss in quality, or that you don’t have to pay for licenses for any special software that you might be currently in order to produce them. You may even just decide that the method covered in this tutorial is easier than whatever method you’re currently using.

What you will need to follow along with this article

While you can design the icon itself in any graphics application you like (so long as it is capable of saving or exporting a PNG with alpha channel), the only application you need to produce your multi-resolution Windows ICO files is GIMP 2.

“GIMP” stands for “GNU Image Manipulation Program”. It is 100% free and open source, with a simple Windows installer.

GIMP is basically a free, open source alternative to Adobe Photoshop, however it provides some very useful features as standard that Photoshop does not have (such as the feature we’ll be using to take our icon graphics and save them as a fully-compliant multi-resolution ICO file).

Because GIMP is provides a robust set of tools for composing images, you can use GIMP for the entire process of making the graphics for your icon if you wish. Personally, I’m more used to Adobe Illustrator for doing icon graphics, so I produce the image itself in Illustrator, export each resolution of the icon as a PNG with alpha channel, then use GIMP to produce the ICO file.

Making your icon graphic

My FireMonkey Icon Design

My FireMonkey Icon Design

Above is the original Vector Graphic for my Flaming Ape icon composed in Adobe Illustrator.

Once you’ve composed your graphic (regardless of the tool you’re using for that task) you need to save/export scaled versions of the image at the following resolutions:

  • 512×512 (optional! This will work nicely in the ICO file, but I haven’t seen an instance where 512×512 is actually used by the Windows Explorer Shell)
  • 256×256
  • 128×128
  • 96×96
  • 64×64
  • 48×48
  • 32×32
  • 16×16

Note that you can use different graphics for each scale, so if your graphic becomes an incoherent mess at lower resolutions (as my FireMonkey icon does once you get below 48×48), you can make an alternative design for those resolutions so that your icon appears clear regardless of the display size.

These scaled images should be saved as PNG files with channel (transparency).

Here’s a gratuitous screenshot of the individual scaled images for my icon:

My FireMonkey Icon Scaled Images

My FireMonkey Icon Scaled Images

Now, you can clearly see that my icon becomes practically incoherent below 48×48, so really I should make an alternate version for 16×16 and 32×32 for more clarity at those resolutions. I’m not going to do that right now (I will eventually) but you should make sure that each scale of your icon looks presentable before compiling your ICO file, and certainly before releasing your application.

Compiling a multi-resolution Windows ICO file from your scaled images

Now it’s time to launch GIMP

GIMP Splash Screen

GIMP Splash Screen

Note that the first time launching GIMP can take some time, but subsequent launches are much quicker. I presume this is because GIMP pre-generates certain things on its first execution (I haven’t really looked into it too much, because it’s not a big deal to me).

Once GIMP has loaded, you will see the following screen:

GIMP Main Screen

GIMP Main Screen

On GIMP’s main menu, go to File, then click New. Expand the Advanced Options panel and ensure that your settings match those shown in the following screenshot:

GIMP Icon Project Settings

GIMP Icon Project Settings

Pay special attention to the Width, Height and Fill with options, which are different from the defaults.

If you’ve set everything up correctly, you will now see the following screen:

GIMP Blank Icon Project

GIMP Blank Icon Project

Now we’re ready to drag our scaled PNG images and drop them directly into the Layers tool window, underneath where it shows the Background layer.

Note that the order of the layers does not matter!

Now select the Background layer in the Layers tool window, and delete it using the trash can button on the bottom of the Layers list.

GIMP Background Delete (Button Highlighted)

GIMP Background Delete (Button Highlighted)

Okay, now on the main menu, go to File, then click Export or just use the keyboard shortcut Shift+Ctrl+E which does the same thing.

Now you will see an Export Dialog, and by default it will determine the File Type to use by the file extension specified in the Name box.

GIMP Icon Export Save Dialog

GIMP Export Dialog

In this case, I’m naming my icon file “My Awesome Icon.ico” (remember, the “.ico” bit at the end is crucial).

When you hit the Export button, you will see another dialog:

GIMP Icon Export Dialog

GIMP Icon Export Dialog

GIMP automatically determines the correct settings for your ICO file. 256×256 and 512×512 scales will automatically tick the Compressed (PNG) option, and lower resolution versions should have that option unticked.

You’ll note the caution notice at the bottom of the dialog telling you that “Large icons and compression are not supported by all programs“, this most notably refers to older versions of Windows (XP and older) which didn’t support 256×256 or above. This is fine, and your icon will work properly on XP, Vista, 7, 8 and above.

Press the Export button.

You can now close GIMP.

Now, in the case of my icon demonstrated here, supporting all resolutions up to 256×256, the file size is just 98.5KB. The quality is perfect at all resolutions down to 48×48 (remember, I need to make an alternative version for 16×16 and 32×32 because the size cuts out too much detail).

Thank you!

I hope you found this tutorial useful. As always, your questions are always welcome. You may leave a comment and I shall try to answer your questions as best I can.

Update:

Steema (the creators of TeeChart) have released a useful open-source tool to help produce icons and “Splash Images” for FireMonkey applications. Read all about it here!

27th September 2014
by Simon J Stuart
8 Comments

FireMonkey Enhancements – Cached Text Rendering

Slow Text Rendering

Rendering text in FireMonkey is slow. There’s no “nice” way to say that, it’s just the way it is.

Now, one significant way of improving FireMonkey’s text rendering performance is to cache each text element, paired with the attributes used to render it, so that all subsequent requests for that specific piece of data with those same attributes does not need to perform the time-consuming computation as the initial request.

Instead, a series of optimized lookups are performed to request the cached text element from storage, and draw that into the frame accordingly. This process is vastly less time-consuming at runtime (though, of course, a little complex to produce).

The FireMonkey Enhancements Library [FMXE]

To spare you the troubles of having to produce your own Text Render Cache, I have written one for you… and released it as open-source on GitHub.

Better still, I have also added a Class Helper for TCanvas which introduces the top-level overloaded procedure necessary to render a text element onto said Canvas from the cache!

This Text Render Caching system is just a small part of what aims to be a significant expansion for the FireMonkey framework. The hope is to introduce highly-optimized alternatives for just about every single rendering task, and to package them up into the FMXE library for the benefit of everyone.

The stretch objective is to then build a series of highly-optimized FireMonkey components to replace the standard components, all of which are deliberately designed to use the FMXE methods instead of the built-in, inefficient methods from FireMonkey itself.

The FMXE library will of course provide simple demos to show how each feature is used, and – naturally - the Text Render Caching system is no exception.

This demo, as well as showing how to use the Text Render Caching system, serves as a like-for-like performance benchmark, enabling you to directly compare the standard FireMonkey text rendering solution with my cached counterpart.

Cached Text Rendering using the TCanvas Class Helper

I have attempted to make this system as easy to use as possible, as you shall now see…

uses
  // Normal units here...
  FMXE.Helpers.Canvas;

// Type declarations etc.

implementation

procedure SayHelloWorld(const ACanvas: TCanvas; const AFont: TFont);
begin
  if ACanvas.BeginScene then
  begin
    ACanvas.Clear(TAlphaColors.Black); // Ensure the Frame is empty (black screen)
    ACanvas.DrawCachedText(PointF(20, 20),     // Position (Overloaded method takes a TRectF too)
                           45,                 // Angle
                           TAlphaColors.Red,   // Color
                           AFont,              // Font
                           'Hello World',      // Text
                           False,              // Word Wrap
                           1.00,               // Opacity
                           [],                 // Text Flags (right-to-left)
                           TTextAlign.Center,  // Horizontal Alignment
                           TTextAlign.Center); // Vertical Alignment
    ACanvas.EndScene;
  end;

This code snip (above) shows how easy the Text Render Caching system is to use in your programs. Since the TCanvas Class Helper introduces the DrawCachedText method to the TCanvas class, you don’t need to do anything aside from adding the FMXE.Helpers.Canvas reference to your Uses section.

Please note that the Position or Rect attribute has nothing to do with the Caching (it is used on a per-case basis to draw the cached text element into the frame). different Angles are cached, as image rotation is computationally expensive as well (matrix transformations and whatnot)

The first render request will be just as slow as rendering using FireMonkey’s TCanvas.FillText method, but subsequent requests to render the same text with the same attributes will be astonishingly fast by comparison.

Enjoy!

Feel free to download the FireMonkey Enhancements Library on GitHub and try it out for yourself!

Visit Us On Google PlusVisit Us On Youtube