Handmade Hero Day 004
NOTE(bk): The notes will be changed. At the moment they are my opinion or interpretation of what Casey says during the video. I will change so that it will be from what he describes in the video, with me making the rare aside note.
StretchDIBits vs BitBlt
Casey: Chris Hecker messaged him about this, for he says that he was having troubles remembering on stream yesterday. He normally does not use StretchDIBits.
BitBlt does not take a pointer to the bits. It has to go device context to device context. Which means you have a DC that you leave around that has the bitmap in it, and you always do the BitBlt from that DC to the Screen/Window DC, that you are actually drawing to. Which is why Casey did all the stuff like CreateCompatibleDC.
So why did Casey always use BitBlt? StretchDIBits used to be the slow path, it was slower than BitBlit since Windows™ could do the allocation of the memory and had the bitmap already selected it was a faster path then going through StretchDIBits.
Since this is supposed to be for us to write a renderer and that performance difference may not even exist anymore, there’s probably no reason to ever use BitBlt. StretchDIBits should probably be fine for us for eventually when we optimize things we will want to create an OpenGL context and actually write directly to a texture.
Update to win32_handmade.cpp
Casey: We no longer need to store the BitmapDeviceContext and the BitmapHandle. We can then allocate our own memory, however we want to. And as long as it is the right size and works okay, Windows will go ahead and do the right thing.
How much memory do we need?
Casey: We need 32 bits per pixel: 8 bits red, 8 bits green, 8 bits blue and will leave 8 bits padding. The reason is in general, on x86 architecture there is a penalty for unaligned accessing.
Thus, he needs 4 bytes * Width * Height
of memory.
Penalty for unaligned accessing
Casey: If you are trying to operate on 8 bits, 16 bits, 32 bits, or 64 bits, 32 bit values should be aligned on 32 bit boundaries. If they are 4 bytes long, they should start on multiple of 4 bytes of memory. Thus he is asking for 8 extra bits that are simply just padding, and don’t actually mean anything or used. But, doing so will let us know that our pixels are always aligned on 4 byte boundaries.
Creating memory in Windows
Casey: There are several different methods that we can use. He likes using VirtualAlloc. Another option is HeapAlloc.
When you allocate memory from the system, it is usually allocated in pages. Pages are a certain size, in Windows they are often 4096 bytes. Sometimes they are 64 KB, it depends whether they are large pages or small pages.
HeapAlloc asks the system to sub-allocate out of the pages for you, so you can pass any size you want and you will get back a pointer to that.
VirtualAlloc is a little more raw, for it must give you back pages. You thus cannot ask for something smaller than 4096 bytes and receive it. It will simply give by the whole page, and the rest will just be wasted.
We will be doing all of our allocation almost entirely ourselves, so he wants to get started using VirtualAlloc so we are used to calling it.
VirtualAlloc function
LPVOID WINAPI VirtualAlloc(
_In_opt_ LPVOID lpAddress,
_In_ SIZE_T dwSize,
_In_ DWORD flAllocationType,
_In_ DWORD flProtect
);
VirtualAlloc Parameters
lpAddress [InOpt]
Type: LPVOID
The starting address of the region to allocate. If the parameter is NULL, the system determines where to allocate the region.
Casey: Every process has a virtual memory space, so this would be good if we actually want to place this specifically somewhere within our virtual memory.
dwSize [In]
Type: SIZE_T
The size of the region in bytes. If the value is NULL, this value is rounded up to the next page boundary.
flAllocationType [In]
Type: DWORD
Casey: There is basically two types: commit and reserve. Each process starts off with 64 bit address space but all of it is essentially vacant, it’s not actually memory yet. Commit will actually use and lets Windows know that it should track this memory. Whereas reserve asks for memory but you don’t actually need to start tracking it.
Value | Meaning |
---|---|
MEM_COMMIT | Allocates memory charges for the specified reserved memory pages. |
—————– | ——————————————————————- |
MEM_RESERVE | Reserve a range of the process’ virtual address space without |
allocating any actual physical storage in memory or in the paging | |
file on disk. | |
—————– | ——————————————————————- |
Casey: The other values of allocation types we don’t need to worry about at the moment.
flProtect [In]
Type: DWORD
The memory protection for the region of the pages to be allocated. If the pages are being committed, you can specify any one of the memory protection constants
Casey: We do not need to execute code out of this memory, so we do not need to set the execute bit. This is for security purposes, or forking purposes. We just want full read and write access, thus our process can both look at the contents of the memory and store things to memory.
Value | Description |
---|---|
PAGE_READWRITE | Enables read-only or read/write access to the committed region |
of pages. | |
——————– | —————————————————————- |
VirtualAlloc Return value
Casey: It returns an LPVOID pointer, which is a long pointer to a void *
.
If the function succeeds, the return value is the base address of the allocated region of pages.
If the function fails, the return value is NULL. To get extended error information, call GetLastError.
Freeing the memory
Casey: Whenever Win32ResizeDIBSection gets called, we first want to check if BitmapMemory is available and if so, we will want to free the memory before assigning new memory. To free, we will be using VirtualFree.
VirtualFree function
BOOL WINAPI VirtualFree(
_In_ LPVOID lpAddress,
_In_ SIZE_T dwSize,
_In_ DWORD dwFreeType
);
VirtualFree Parameters
dwSize [In]
Type: SIZE_T
The size of the region of memory to be freed, in bytes.
If the dwFreeType parameter is MEM_RELEASE, this parameter must be 0 (zero). This function frees the entire region that is reserved in the initial allocation call to VirtualAlloc.
dwFreeType [In]
Type: DWORD
The type of the free operation.
Casey: We will use MEM_RELEASE to free the memory entirely. MEM_DECOMMIT frees the pages, but keep the pages reserved.
Brian: Is the reason Casey does not want to use the same memory region is because we don’t know what we are changing the size to, thus the number of pages required, and won’t want to have them fragmented? Could they be fragmented?
Future, using VirtualProtect for debugging purposes
Casey: You can use VirtualProtect to change the protect level to PAGE_NOACCESS, which we could do in debug mode. The reason for doing this is so that nobody has a stale pointer to the page. Because if someone kept a pointer to the page and they were going to try to write to it that would be a “use after free” bug, which are very hard to track down sometimes. So this is one way that you can catch those bugs.
Win32ResizeDIBSection update to use VirtualAlloc
Edit Win32ResizeDIBSection in win32_handmade.cpp
:
// We first check to see if BitmapMemory exists, and free if true.
if (BitmapMemory)
{
VirtualFree(BitmapMemory, 0, MEM_RELEASE);
}
// BitmapInfo initialization
// ...
// The memory we need is the size of the area multiply by each pixel. Since a pixel is
// 32 bits, or 4 bytes, we will multiply the area by 4.
int BytesPerPixel = 4;
int BitmapMemorySize = (Width*Height)*BytesPerPixel;
BitmapMemory = VirtualAlloc(0, BitmapMemorySize, MEM_COMMIT, PAGE_READWRITE);
Updating Win32UpdateWindow to use BitmapMemory
Casey refers to the initial implementation of Win32UpdateWindow as a dirty rectangle update. We pass windows the region of what Windows asked us to paint. He says that bugs could be introduced by simply using this method so instead he will fill the whole window.
global_variable int BitmapWidth;
global_variable int BitmapHeight;
internal void
Win32ResizeDIBSection(int Width, int Height)
{
// rem: ...
if (BitmapMemory)
{
VirtualFree(BitmapMemory, 0, MEM_RELEASE);
}
// Since Width and Height are referring to the bitmap, we will in turn use these as
// the variables.
BitmapWidth = Width;
BitmapHeight = Height;
// rem: ... (update Width -> BitmapWidth, Height -> BitmapHeight)
}
internal void
Win32UpdateWindow(HDC DeviceContext, RECT *WindowRect, int X, int Y, int Width, int Height)
{
int WindowWidth = WindowRect->right - WindowRect->left;
int WindowHeight = WindowRect->bottom - WindowRect->top;
StretchDIBits(DeviceContext,
0, 0, BitmapWidth, BitmapHeight, // Destination
0, 0, WindowWidth, WindowHeight, // Source
BitmapMemory,
&BitmapInfo,
DIB_RB_COLORS,
SRCCOPY);
}
How to store the pixels in memory (logically)
Casey: We have a 2D object, but a 1D representation of it (in memory).
Once you get to the end of a row, where do you put the next row? The value that you would add to a pointer to the first row, to move it to the next row is called the Pitch. And the Stride is the number that you would add to a pointer at the end of a row to move it to the next row.
We need to know what StretchDIBits uses. This seems to depend on what the biHeight (bmiHeader of the BitmapInfo) value. If biHeight is positive, the bitmap is a bottom-up DIB and its origin is the lower-left corner. If biHeight is negative, the bitmap is a top-down DIB and its origin is the upper-left corner.
Casey prefers top-down, so he negates the BitmapHeight when setting the biHeight. He notes that Windows™ uses a top-down for their coordinates, X begins at the left and gets bigger as it goes to the right. Y starts at zero at the top and grows as it goes to the bottom.
Edit Win32ResizeDIBSection in win32_handmade.cpp
:
// rem: Updating Win32ResizeDIBSection BitmapInfo initialization
// Negate the height so that StretchDIBits will treat this as top-down DIB.
BitmapInfo.bmiHeader.biHeight = -BitmapHeight;
Casey makes a note that he could not see within the MSDN documentation what the Stride of either StretchDIBits or BitmapInfo was, so is going to investigate.
-FC
- Full path of the filename in a compile error (cl command line switches)
Adds -FC
to the command line in build.bat
so that it will return the full path of the filename.
Updating the prototype of Win32UpdateWindow - Add parameter
Due to adding a new parameter to Win32UpdateWindow, we need to update what calls this function. This is done from within WM_PAINT message, and the RECT we need is what we get from GetClientRect, which we also used in WM_SIZE.
Edit Win32UpdateWindow in win32_handmade.h
:
internal void
Win32UpdateWindow(HDC DeviceContext, RECT *ClientRect, int X, int Y, int Width, int Height);
TODO(bk): Verify that this is in .h
typedef for specific size pointers
// The old way of creating specific size pointers
typedef unsigned char uint8;
// An update was added that you can instead include this header file
#include <stdint.h>
typedef uint8_t uint8;
He then goes on to create the remaining:
typedef int8_t int8;
typedef int16_t int16;
typedef int32_t int32;
typedef int64_t int64;
typedef uint8_t uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;
typedef uint64_t uint64;
Drawing the bits to the screen
In our case, the pitch is Width * BytesPerPixel.
Bytes: 0 1 2 3
Pixel in memory: 00 00 00 00
Which is the red, green, blue and padding byte?
Perhaps a common sense way to think about would be
Pixel in memory: RR GG BB xx
But it turns out that the first byte is blue! Why is it blue? Little Endian architecture.
Which would make: 0x xx BB GG RR
But people who wrote Windows™ did not like that very much, thus they swapped the memory.
In the end: 0x RR GG BB xx
So all Windows™ bitmaps have the blue byte is first, green byte is second, red byte is third and the pad is at the end.
Drawing the bits implementation (8 bit update)
Edit Win32ResizeDIBSection in win32_handmade.cpp
:
// Updating Win32ResizeDIBSection first. Later we should pull this functionality
// into its own function.
int Pitch = Width * BytesPerPixel;
uint8 *Row = (uint8 *)BitmapMemory;
for (int Y = 0;
Y < BitmapHeight;
++Y)
{
uint8 *Pixel = (uint8 *)Row;
for (int X = 0;
X < BitmapWidth;
++X)
{
// Blue
*Pixel = (uint8)X;
++Pixel;
// Green
*Pixel = (uint8)Y;
++Pixel;
// Red
*Pixel = 0;
++Pixel;
// Padding
*Pixel = 0;
++Pixel;
}
Row += Pitch;
}
Aside: Casey notes later that a more concise way of assigning the pixels would be:
// Instead of writing like this:
*Pixel = (uint8)X;
++Pixel;
// We can use operator precedence and change to
*Pixel++ = (uint8)X;
Pull out rendering code to its own function called RenderWeirdGradient
Add RenderWeirdGradient to win32_handmade.cpp
:
// For now, we need to pull this out of Win32ResizeDIBSection and make it global
global_variable int BytesPerPixel = 4;
internal void
RenderWeirdGradient(int XOffset, int YOffset)
{
int Width = BitmapWidth;
int Height = BitmapHeight;
// Pull out all rendering code defined above in 8-bit implementation
// Update setting the blue and the green to:
*Pixel = (uint8)(X + XOffset);
++Pixel;
*Pixel = (uint8)(Y + YOffset);
++Pixel;
}
To reproduce what we had before, simply add:
Edit Win32ResizeDIBSection in win32_handmade.cpp
:
// pos: Where the render logic was located, replace with:
RenderWeirdGradient(0, 0);
Changing GetMessage to PeekMessage
The issue with GetMessage is that if you call it and there are no messages, it will block and wait for another message.
In order to animate our window, we cannot have GetMessage block.
So now, we will check if the message is WM_QUIT , and if so, set Running to false.
PeekMessage function
BOOL WINAPI PeekMessage(
_Out_ LPMSG lpMsg,
_In_opt_ HWND hWnd,
_In_ UINT wMsgFilterMin,
_In_ UINT wMsgFilterMax,
_In_ UINT wRemoveMsg
);
PeekMessage Parameters
It is pretty much equivalent to GetMessage, with the exception being the last parameter.
wRemoveMsg [In]
Type: UINT
Casey: A flag telling Windows what it should do and since we want to process the messages ourselves, we want to use PM_REMOVE.
Value | Meaning |
---|---|
PM_REMOVE | Messages are removed from the queue after proessing by PeekMessage. |
————— | ————————————————————————- |
PeekMessage Return value
Type: BOOL
If the message is available, the return value is nonzero.
If no messages are available, the return value is zero.
Update message loop to use PeekMessage
Edit WinMain in win32_handmade.cpp
:
while (Running)
{
MSG Message;
while (PeekMessage(&Message, 0, 0, 0, PM_REMOVE))
{
if (Message.message == WM_QUIT)
{
Running = false;
}
TranslateMessage(&Message);
DispatchMessageA(&Message);
}
}
Brian: We have slightly changed the logic. Before, we would immediately quit out of the loop if the message was WM_QUIT or below zero (indicates an error). Should we not do the same?
Add animation to our window
Since now we are not being blocked by windows waiting for messages, we can continuously update our window. For now the idea is simply increment the XOffset by one each loop and then call RenderWeirdGradient.
Add RenderWeirdGradient and Win32UpdateWindow to main loop
Now that we want to animate the gradient (what we are writing to the screen), we need to move RenderWeirdGradient (Casey’s function to actually populate the memory of the bitmap) and Win32UpdateWindow to the main loop.
Edit WinMain in win32_handmade.cpp
:
if (WindowHandle)
{
Running = true;
int XOffset = 0;
int YOffset = 0;
while (Running)
{
// Handle messages
RenderWeirdGradient(XOffset, YOffset);
HDC DeviceContext = GetDC(Window);
RECT ClientRect;
GetClientRect(Window, &ClientRect);
int ClientWidth = ClientRect.right - ClientRect.left;
int ClientHeight = ClientRect.bottom - ClientRect.top;
Win32UpdateWindow(DeviceContext, ClientRect, X, Y, Width, Height);
ReleaseDC(DeviceContext);
++XOffset;
}
}
Rename WindownHandle to Window
From within WinMain, after RegisterClassA(&WindowClass)
, when creating a window, he names it WindowHandle. Rename this to Window.
Rename WindowRect to ClientRect
Throughout win32_handmade.cpp
, every instance of WindowRect, Casey renames to ClientRect.
The updated code:
if (WindowHandle)
{
Running = true;
int XOffset = 0;
int YOffset = 0;
while (Running)
{
// Handle messages
RenderWeirdGradient(XOffset, YOffset);
HDC DeviceContext = GetDC(Window);
RECT ClientRect;
GetClientRect(Window, &ClientRect);
int WindowWidth = ClientRect.right - ClientRect.left;
int WindowHeight = ClientRect.bottom - ClientRect.top;
Win32UpdateWindow(DeviceContext, &ClientRect, X, Y, WindowWidth, WindowHeight);
ReleaseDC(Window, DeviceContext);
++XOffset;
}
}
GetDC function
HDC GetDC(
_In_ HWND hWnd
);
ReleaseDC function
int ReleaseDC(
_In_ HWND hWnd,
_In_ HDC hDC
);
How fast can we update the screen using the old school way of updating?
One thing Casey would like to do is test just how fast it takes to update the screen. He says that he won’t be able to due to OBS.
Writing 32 bits is faster than 8 bites x 4
Casey changes the call so instead of setting each byte, he will just set the Pixel, which is 32 bits.
uint32 *Pixel = (uint32 *)Row;
for (int X = 0;
X < BitmapWidth;
++X)
{
uint8 Blue = (X + XOffset);
uint8 Green = (Y + YOffset);
*Pixel++ = ((Green << 8) | Blue);
}
Memory: BB GG RR xx
Register: xx RR GG BB
004 - Q & A
Do you know what the PAGE_WRITECOPY memory protection flag does?
He goes into explaining it, but one of the things that came out was in Unix you can fork a process, which means that there will be a second copy, a clone of the process.
Bytes per pixel and bitmap memory layout
Width ->
0 Width*ByptesPerPixel
BitmapMemory -> 0 BB GG RR xx BB GG RR xx BB GG RR xx ...
BM + Pitch 1 BB GG RR xx BB GG RR xx BB GG RR xx ...
Why was the blue channel set instead of the padding?
MEMORY ORDER: RR GG BB xx
LOADED IN: xx BB GG RR
WANTED: xx RR GG BB
MEMORY ORDER: BB GG RR xx
static keyword, global variables versus local functions
By setting static to a global variable, it means that nothing can use that variable outside of the translation unit (most of the time the file).
Where as without the static keyword, you can. You can do extern int Foo
and thus use Foo in another file.
For internal (internal function), it means that other files cannot call the function by name. They can call the function though. You can do so by someone passing a pointer to call the function. It just means though that other functions cannot call it by name.
Notes
ctime stats
D:\work\handmade\local\handmade>ctime -stats project\handmade.ctm
project\handmade.ctm Statistics
Total complete timings: 51
Total incomplete timings: 0
Days with timings: 3
Days between first and last timing: 6
Timings marked successful (33):
Slowest: 1.855 seconds
Fastest: 1.046 seconds
Average: 1.224 seconds
Total: 40.413 seconds
Timings marked failed (18):
Slowest: 1.466 seconds
Fastest: 0.968 seconds
Average: 1.121 seconds
Total: 20.184 seconds
All (0.200000 days/bucket):
| * 1.855 seconds
|* *
|* * *
|* * *
|* * *
|* * *
|* * *
|* * *
|* * *
|* * *
+------------------------------ 0.000 seconds
| * 22
| *
|* *
|* *
|* *
|* * *
|* * *
|* * *
|* * *
|* * *
+------------------------------ 0
Recent (1.000000 day/bucket):
| * 1.855 seconds
| * *
| * * *
| * * *
| * * *
| * * *
| * * *
| * * *
| * * *
| * * *
+------------------------------ 0.000 seconds
| * 22
| *
| * *
| * *
| * *
| * * *
| * * *
| * * *
| * * *
| * * *
+------------------------------ 0
Total time spent: 1 minute, 0.597 seconds