DMA & HDMA – Super Nintendo Entertainment System Features Pt. 07

DMA & HDMA – Super Nintendo Entertainment System Features Pt. 07


This video is part 7 in a series about Super
Nintendo Entertainment System features. This time we’ll learn how the processor communicates
with external memories such as OAM, VRAM, and CGRAM. This will lead into Direct Memory Access,
DMA, and eventually H-Blank DMA. Some of the devices the processor communicates
with in addition to main memory are MMIO devices, which stands for Memory Mapped Input/Output. The amount of physical RAM in the Super NES
is much less than the entire address space of 26^6 bytes. As you will find out in a future video, some
of these addresses in the address space are reserved for other things like controllers,
cartridge ROM, and SRAM. However, there are memories that are not memory
mapped, and must be accessed indirectly. These include OAM, CGRAM, and VRAM. Instead, a memory mapped hardware register
is used as a pointer into one of these areas, and another couple registers are used to read
to and write from the area at the memory location specified by the former. For example, to write something in the first
location of CGRAM, first the CGRAM address register at $2121 must be set to zero, then
the data to write should be written to $2122. Whenever a value is read from or written to
these registers, the corresponding address register is incremented automatically, so
data can be easily accessed sequentially. Here is a table that shows which registers
are used for each memory. It is important to note that in these three
locations, the data held at each address is 16 bits instead of 8 bits like in the main
memory. In the case of video RAM, these 16 bits are
spread out between two registers, but in OAM and CGRAM’s case, the 16 bits are written
8 at a time in a single register. In addition to these 3 memories, work RAM
can also be read or written to indirectly, using its hardware registers. This might seem useless, but it allows direct
memory access to WRAM, which results in a very speedy operation. If you haven’t figured by now, these special
hardware registers are used as part of the direct memory access process. Direct Memory Access is just a way to quickly
move data from one place to another. More specifically, it allows transferring
data between the A bus, a.k.a. CPU-space, a.k.a. anywhere in the 24-bit address space,
and the B bus, a.k.a. PPU-space, a.k.a. any of the 256 registers of $2100 through $21FF. The reason why it is so fast is because it
directly connects the data’s source to its destination with no middle man, which would
normally be the processor. Using the processor to load bytes from one
place and store them to another is relatively slow, resulting in about 336 kilobytes/s. Even using quick pop slides, or even the move
block instructions specifically meant for moving large blocks of memory, only give 358
and 413 kilobytes/s respectively. DMA however can reach speeds of up to 2.68
megabytes/s. This is incredibly useful when moving around
huge amounts of data, like graphics. The SNES has 8 DMA channels, which means 8
transfers can be prepared at once, and then initiated sequentially one after the other. Each channel has 12 dedicated registers, 7
of which need to be populated with the DMA properties that explain how the transfer will
work. Three registers are used to determine the
24-bit address on the A bus to transfer, while another register is used for the 8-bit address
on the B bus. Two registers are used to hold the number
of bytes to transfer in total, and the final register holds all the other properties for
the DMA setup. This includes which direction the data are
transferred, whether or not to automatically increment or decrement the A bus address,
and the format of one unit of data and how it should be sent over the B bus. As an example, here is how one would go about
transferring a large block of graphics data from ROM to video RAM using DMA channel 0. First, the target VRAM address would be written
to $2116 and $2117, and the B bus address should be set to the lower 8 bits of the VRAM
data I/O register, $2118. Second, the A bus address should be set to
the location in ROM where the graphics data is stored, and the total number of bytes should
be specified as well. Third, the transfer properties should be set–in
this case, the transfer occurs from A bus to B bus, the A bus address should be incremented
after each byte transferred, and the transferred bytes should be written two at a time in the
order of low and high to match the VRAM data I/O registers. Finally, the DMA can be initiated by setting
the channel’s corresponding bit in the DMA enable register at $420B.
The least significant bit corresponds to channel 0, while the most significant bit is for channel
7. At this point, the processor would be suspended
from executing instructions while the DMA transfer is occurring; once the transfer is
complete it will be reactivated and execution continues normally. While generally, DMA can be performed at any
time, most of the registers accessible via the B bus are used by the Picture Processing
Unit to display the image to the screen. Therefore, in order to read and write values
to these registers, care needs to be taken so the PPU and CPU don’t clash with each other
while reading or writing data. Specifically, data should only be read or
written by the CPU during certain blanking periods, one of H-Blank, V-Blank, and F-blank. As shown in the previous video, enabling forced
blanking in the middle of rendering can lead to a black streak across the screen, and trying
to time this maneuver to occur exactly during H-blank can be difficult. Luckily, there is an easy way to do this,
and it is called H-Blank direct memory access, or HDMA for short. HDMA and DMA both use the same 8 transfer
channels, so they can’t use the same channel at the same time. They also use the same set of 12 registers,
although for HDMA only 5 need to be set manually beforehand, and 6 more are updated automatically
during the transfer. One register holds the properties for the
HDMA setup, and one holds the 8-bit address on the B bus just like the general purpose
DMA setup. The last three registers hold an address on
the A bus that points to what is called an HDMA table. All of this data only needs to be set once
before initiating the HDMA by setting the channel’s corresponding bit in the HDMA enable
register at $420C. After that, the data found in the HDMA table
will be transferred to the specified hardware register automatically during H-blank at the
scanlines specified in the table itself. The HDMA table is fairly straightforward and
just includes instructions on what data should be transferred and when it should be transferred. It includes a list of data entries followed
by a single zero byte which signals the end of the list. Each entry in the list includes a 7-bit line
count, a 1-bit continue flag, and the data to transfer on that scanline. If the continue flag is clear, the data will
include only a single unit, and it will be sent over the B bus on this scanline. Then this channel will pause for the number
of scanlines specified by the count before moving to the next list entry. If the continue flag is set, the data will
include multiple units of data equal to that of the count. One unit will be sent on each scanline for
this many scanlines; after which it will move onto the next list entry. Finally, if the count is zero then this HDMA
channel will be suspended for the rest of the frame. An additional setting in the HDMA properties
register allows for an indirectly addressed HDMA table–this means that instead of the
units of data being included in the table, pointers to data can be used instead. This is useful for dynamic HDMA tables that
are stored in work RAM. The pointers can be swapped out for others
that point to different data tables in the ROM. As an example, let’s look at how this windowing
effect could be recreated using HDMA. To set up the HDMA transfer we need to set
the A bus and B bus addresses, as well as the transfer properties. Suppose the table is stored in ROM at $0BE00F. Using DMA channel 3, that address will go
into registers $4332 through $4334. The PPU register to modify in order to set
the left side of window 1 is $2126, so #$26 will be stored into $4331. Data is moving from the A bus to the B bus
so we reset bit 7 of $4330. The HDMA table will use direct addressing
format, not indirect format, so bit 6 stays reset also. And finally, the window register is write-once
and only one byte wide, so the transfer format is mode 0. This means each unit of data will be only
one byte in size. Then, to initiate the HDMA transfer on channel
3, we set bit 3 of $420C. This could be done by loading in the constant
#$08 into the register, but this would also disable any other HDMA transfers that were
previously set up. Since the HDMA enable register is not readable,
one way of properly initiating the transfer would be to set up all the channels first,
then initiating them all at the same time. Another way would be to keep track of which
channels are currently enabled in a separate register, and logically OR with that value
before writing it to $420C. Now what would the HDMA table look like? The width of the left side of the window at
the top of the screen is $60 pixels wide, and it stays at that position for $60 scanlines. So the first entry to the table would just
be $60 $60–set the register to $60 then wait for $60 scanlines. Then, for $10 scanlines, the position of the
window changes every scanline. The line count for the next entry in the table
would be $10, and the continue flag should be set. So the first byte would be $90 in this case. Then, $10 bytes should follow, which would
be the position of the window for these next $10 scanlines. And finally, the position of the window doesn’t
change past this point of the screen. So the HDMA transfer is complete, and $00
should be written at the end of the table. HDMA is very powerful since it allows easy
modification of many different PPU registers in the middle of a rendering a frame. The end result of one frame of rendering is
just that–a single image; however, the values of the registers used to create that image
were not constant and changed over time. A great way of visualizing this change in
the register is by recreating the image one scanline at a time by referencing a virtual
copy of the screen that has the entire image rendered at once using only the values of
the registers at that instant. The end result is something that looks very
similar to the rolling shutter effect that occurs in real life with certain video cameras. Looking back at the windowing example, we
see that even though there is only a single register that controls the left position of
the window, it changes over time exactly when the image is rendered to produce the illusion
that each scanline is controlled separately. The rest of this video will just be looking
at examples of various PPU registers and what common effects can be produced by using DMA
and HDMA transfers. By using DMA to write to OAM in the middle
of the frame, the number of objects on screen can be artificially increased. This example was hinted to in the previous
video; Super Mario Kart keeps track of two OAM mirrors, one for each half of the screen. F-blank is enabled in order to perform this
transfer, which explains the thick black bar in the middle of the screen. If the animation is slowed down, each entry
into OAM can be seen updating one after the other. Probably the most useful and versatile register
to modify with HDMA is the background mode register. Many games will opt to render a HUD or text
box in one mode, and the rest of the game in a different mode. This allows for the main portion of the game
to take advantage of the higher bit-depth of one background mode, but allow the other
portions of the screen to benefit from the multiple layers, like in Super Mario World
here. Scrolling the background layers mid-frame
allows for wave-like effects. In the first screen of Castlevania: Dracula
X, both vertical waves for the fire and horizontal waves for the background was used. By scrolling the background horizontally in
large jumps, parallax scrolling can be achieved without putting each background element on
a separate layer. In Donkey Kong Country 2, this effect is used
for the clouds and the ocean combined. The most well- known usage of HDMA is to achieve
perspective effects via mode 7 scaling and rotation. Mode 7 transformations are strictly linear,
so in order to have non-linear transformations, HDMA must be used to modify the matrix parameters
every scanline. Many games made use of this effect, including
F-zero and Pilotwings shown here. This was commonly used together with game
mode switching in order to have a horizon and background, as well as a HUD on a different
background layer. The fixed color constant can be modified to
create gradient effects with color math. It can be used to change just the back area
color, such as this one in Yoshi’s Island. And it can also be used along with color addition
and subtraction–this is the method used to create the gradient in the text boxes in Final
Fantasy III. And finally, any windowing effect that isn’t
just vertical bars uses HDMA to modify the left and right boundaries of the window while
the image is rendered. Super Metroid has a couple windowing effects,
one for the Eye shining its light beam, and another for the power bombs. And with that, the PPU-centric chapters of
this series come to a close. The next video will be about controllers and
gamepads, and how input is taken from the player’s hands and fed into the software. As always, thank you for watching.

17 thoughts on “DMA & HDMA – Super Nintendo Entertainment System Features Pt. 07

  1. This is a reupload due to a few mistakes in the video that warranted more than just a mention in the description.

  2. Okay, this video lost me. All the others in the series were easy to follow, but all this memory address stuff seems to have a prerequisite of learning assembly or something.
    I like the examples of using HDMA though; I kinda understood those.

  3. I develop for the Genesis. I know that the DMA chip on the Genesis can move about 91 bytes per Vertical blank which amounts to 22 screen tiles.So does this mean that the super nintendo can move more screen tiles. according to my math its moving 44.3 kilobytes per vertical blank NTSC. I don't think that's right… I'm doing the math wrong I think.

    EDIT: Did a bit of research it seems like the Genesis DMA was faster by virtue of it's 16 bit bus, BUT it is not significantly faster. Also the SNES uses 2 8 bit buses for DMA which reduces the size per transfer but it was asynchronously clocked so it could achieve much higher speeds. It also appears that SNES DMA was severely hampered by SLOW ROM access. Does any of this matter…. no not really. If you want to read more take a look at this: https://www.quora.com/Was-the-Sega-Genesis-faster-than-the-Super-NES

    I would really like to do some game development on the SNES later but it will require a major adjustment to my workflow. THE SNES 65C816 is a very alien thing to me. Lots of specific commands that I've never seen before. Thank you for the videos.

  4. Wouldnt it be easier for Nintendo to just allow mode 7 to also scale the screen differently on the too and make a plane view such as the one used in f zero?

  5. Your videos are so fantastic – I love how thorough and detailed you get (I feel like I come away from every video with a true understanding of the topic beyond just a vague idea of how it works in theory) and your visualizations are absolutely top-notch. This series especially has made me really come to appreciate the work done to create games for consoles like the SNES and NES – as someone who grew up with more advanced hardware and software it's easy to take for granted things like rotation and scaling, perspective transforms, distorted backgrounds, having tons of sprites onscreen, etc. but watching your videos has really made me understand and appreciate how much specialized work had to be put into these effects both on the hardware and software ends. So thank you so much for the time and effort you put into understanding these consoles and creating your videos.

  6. I don't understand a single thing…….BUT still super interesting to hear about how old school games/consoles operated under the hood

Leave a Reply

Your email address will not be published. Required fields are marked *