Back to Electric Hot Fun

I am not the Gate

Lag Mailbag

Sitemap : Hierarchical

Sitemap : Changelog

Email

A Treatise on Lag
1.02
shren.net

Changes for 1.02:

  • Added some bits on desync.
  • Added entries for Black Wall and Desync in the big table at the bottom.
  • Lots of little trivial changes.

Changes for 1.01:

  • Removed quest rant.
  • Included lots of bits of advice/wisdom from readers.
  • Tried to clean up the sections.
  • Naughty words removed. (Thanks, Brig.)

Everybody hates lag.

Few actually understand it.

Lag's like some kind of anthropomorphic Rushdie.

Lag is your most dangerous opponent in Diablo II. A guildmate of mine, BYC, once said, "Lag kills all builds." There's not a monster in Diablo II that can't be taken down by a good character with a good plan. However, lag can bring even the best character with the best plan down to his knees. The design of the Diablo II classes and gear don't help - almost every high level melee character is dependant on life leech to survive, and to leech, you have to attack, and lag often prevents that. Understanding what lag is, how it occurs, and what one can do about it levels out the playing field as much as possible.

I'm a computer programmer, and I've done my share of network programming. Reading this treatise won't make you a network engineer, but it will give you important insight into the underlying principles of a networked game, and this understanding will help lag become more of a known quantity.

First, I'll break down the components involved, hardware and software, and then we'll get to the ramifications of these things.

The Server

What do you think of, when you think of the server? Do you think of a computer, much like yours with no player, with monitor showing your game to a room full of Servers? An interesting image, but unlikely. There's no need for input, graphic output, monitors, or keyboards. A server program just has to keep track of where everything (monsters, treasure, etc. . .) is and what it's doing. All of the real heavy cpu and memory eaters - graphics and sound - are unnecessary. The server doesn't need to know what a flayer looks or sounds like. That's all your client. It just needs to know what it acts like, how many hit points it has, how much damage it can do, and where it is.

So, throw away any preconceptions. Your server process is probably just a network-output-only process on a linux cluster on a rack. And, likely, there are 20 different games being run on the same CPU.

The Network

The Internet is a packet-switched network. This simply means that all information being passed between any two computers on the internet travels in the form of packets. These packets are a lot like people going through a Greyhound trip. They arrive at each station, find out where they need to go next, wait in line, and get on the next bus. The waiting in line part is probably what causes the largest source of delays in the modern internet. Travel across a wire happens very quickly, but if there are a lot of packets in line, and a lot of lines to wait in, then all of the waiting adds up to lag. (Fun application of physics for you physics students out there: Based on the speed of light and the distance between you and the server, determine the absolute minimum round trip ping time.)

You know that you have a network connection, and that this network connection hooks up to your ISP. Let's finish the picture. Your ISP hooks to other ISPs through bandwidth providers - people who own the actual cable that stretches from point a to point b. At the other side, we reach battle.net's ISP, which is a special case which we'll talk about in a bit.

Using a handy program called traceroute (c:\windows\tracert.exe on most windows versions if you want to play with it. You'll need to invoke it from the command line. Something like "c:\windows\tracert useast.battle.net" should work.) we can learn about the path from our computer to Duriel. The important part to look at below is the far right side, which gives the machine names for the hops in order, starting at my machine and going to battle.net. We start at Newton, go through zoomtown, then through alter.net, then finally end up at exodus.net. The *ip deleted* messages below are me cutting out my internet address, which you don't need to know.


Tracing route to useast.battle.net [64.14.113.140]

over a maximum of 30 hops:

  1     2 ms   <10 ms     2 ms  NEWTON *ip deleted*
  2     1 ms     2 ms     1 ms  *ip deleted*
  3    16 ms    21 ms    18 ms  *ip deleted*
  4    15 ms    16 ms    15 ms  *ip deleted*
  5    17 ms    16 ms    16 ms  WS-GSR-1-pos3-0-oc12.zoomtown.com [216.68.0.5] 
  6    17 ms    18 ms    16 ms  192.168.9.12 
  7    17 ms    17 ms    17 ms  192.168.8.101 
  8    20 ms    18 ms    18 ms  nhg-onenet1.zoomtown.com [216.68.212.53] 
  9    18 ms    32 ms    19 ms  fe-2-0.ztown1.cvg1.one.net [206.112.199.13] 
 10    19 ms    17 ms    22 ms  fe-1-0-0.core4.cvg1.one.net [216.23.23.177] 
 11    32 ms    21 ms    20 ms  fe-4-0.core2.cvg1.one.net [216.23.31.2] 
 12    23 ms    31 ms    38 ms  Serial1-1-0.GW1.IND1.ALTER.NET [157.130.96.169] 
 13    28 ms    33 ms    27 ms  121.at-1-1-0.XR2.CHI4.ALTER.NET [146.188.208.166] 
 14    26 ms    27 ms    30 ms  194.at-2-0-0.TR2.CHI2.ALTER.NET [152.63.65.70] 
 15    55 ms    55 ms    55 ms  126.at-7-3-0.TR2.DCA8.ALTER.NET [146.188.141.166] 
 16    56 ms    56 ms    74 ms  0.so-2-0-0.XR2.DCA8.ALTER.NET [152.63.35.250] 
 17    57 ms    55 ms    59 ms  POS7-0.GW2.DCA8.ALTER.NET [146.188.162.193] 
 18    71 ms    61 ms    61 ms  exodus-OC12.DCA8.customer.alter.net [157.130.42.58] 
 19    69 ms    87 ms    70 ms  dcr03-g9-0.stng01.exodus.net [216.33.96.145] 
 20    73 ms    71 ms    71 ms  csr11-ve241.stng01.exodus.net [216.33.98.154] 
 21    69 ms    70 ms    71 ms  216.33.115.242 
 22    85 ms    84 ms    73 ms  useast.battle.net [64.14.113.140] 

Trace complete.

The important bits to catch in the above is that, essentially, Newton (a computer in my office) links to my ISP , ZoomTown, which talks over alter.net's cables to exodus.net. Visiting alter.net , we can see that they are essentially an ISP's ISP, selling thier global network to people who want connections, public or private. That leaves exodus.net. Exodus is a very specialized company. If you want to put a large group of machines on the net, with the best data security, connection, and backups money can buy, you go to someone like Exodus. Providing server space with good connections is what they do. Exodus is battle.net's ISP, but there is no last hop to Blizzard - battle.net is at Exodus.

If you're curious about the economics involved, I pay ZoomTown, Blizzard pays Exodus, and both Exodus and Zoomtown pay Alter.net. Probably. It's not my area so I could be wrong. Brief aside. Hiring Exodus to carry your servers is nothing even remotely near cheap. I am suprised that Blizzard can offer battle.net to us for no additional cost. Remember this before you start another "Blizzard is crap" rant in battle.net chat. They could have done a lot worse for realms players and still made a whole pile of money.

The Client

It is the client's job to let you in on the action on the server. It's your in, your window, on the game world on the server. If the server tells your machine everything, you see everything. If the server tells your machine nothing, you see nothing.

Think, briefly, of Plato's Allegory of the Cave. (If Philosophy is not your cup of tea, skip the next two paragraphs.) In short, Plato thought that true reality was inaccessable directly. His description of this went something like this. Picture a bunch of people in a cave, chained to the wall, to where they can't see outside. Now picture things at the entrance to the cave, casting thier shadows into the cave where the chained people can see them. The people don't see the source of the shadows - they can't move. They do see the shadows, and that's the only clue they get as to the true nature of reality. They never see the things themselves. (Plato thought that all physical objects were 'shadows' of perfection. But, back to computers...)

Well, imagine yourself as chained to a wall, because this is exactly the way it is. You will never see, feel, or touch the actual reality that your character is in. You will only see the shadows the server casts upon your monitor.

The Illusion of being on the Server

From an engineering perpective, the objective of the client server system is to provide you, the client, the best possible representation of what's going on over at the server, using as little bandwidth as possible.

The server is not telling your client, "put a color 921 pixel at location (200,300)." Assuming you can pack pixel information into 48 bits, to tell your computer what color every pixel on your screen should be, at 30 frames per second, on a 640 by 480 screen, would require a 44236800 bps link. Yes, that's forty hundred million bits per second. Wow, that's even more than I expected. Doublecheck... yup, forty hundred million. Imagine that on the requirements for a game. "Required : a Pentium 200, 128 megs ram, and a forty hundred million bps per second connection to the internet." (So much for thin clients.)

So you've got to optimize. You put all of the graphics on the client, and instead, tell the computer, "Put the flayer graphic at location (x,y) in animation pose z. It has q hit points." That's about, say, 20 bytes. For 30 frames per second, that's 600 bytes per second. For the worst possible situation, imagine 50 flayers on the screen. That's 30000 bps. Now, this neglects connection overhead for IP, UDP, PPP, and lost packets. I just ate all your bandwidth if you're on a 56k line - sorry. This also ignores *everything* else the server has to tell your client - what background animations to run, what the other players are doing, and what treasures the flayers are dropping as your frozen orb mows them down.

If you're thinking about the ramifications of this, you're probably thinking, "Slow down. You seem to be implying that Diablo II isn't even possible on a dialup - it eats too much bandwidth. But I played it just last night." Granted. But do you dispute my 30000 bps claim above?

This is why this section is called "The Illusion". Your client is pretty smart. It makes some intelligent assumtions about the monsters and the game world. It assumes that a treasure doesn't move untill someone moves it. It assumes that a monster doesn't move unless the server says it does. In all likelyhood, we can redo this math - 20 bytes of flayer information once a second, for a total of 20 bps bandwidth use per flayer. The client is told, perhaps about once a second, about where the flayer is and what it is doing, and it presents this information in the most feasible way possible. Now we can handle fifty flayers. They only need 1000 bps bandwidth.

But now we're hip deep in illusion. This is how flayers are invisible one second and surrounding you the next. If you only get 4 updates a second or so on where the flayers are, and the flayers are moving pretty quickly, then a few lost (or even delayed) packets can cause flayers to skip around.

Here's where the illusion gets smoothed out, to become seamless. If the client was last told that a flayer was 12 squares away from you, and is now told it's 8 squares away from you, and is later told that the flayer is 4 squares away, you don't see a flickering flayer. You see the flayer run towards you. Your client is filling in the details. It knows that if the server says, "flayer : (-12,0), then flayer (-8,0)" then the flayer, in the server's mind, did actually cross positions (-11,0), (-10,0), and (-9,0). So that's what your client shows - a smoothly running flayer.

Here's an especially nifty trick Diablo 2 uses. Have you ever known someone so well that you can finish thier thoughts? The server and client are like that. The client knows what a chasing monster acts like, so sometimes it will show the monster chasing you when it only thinks it's chasing you. If you've ever killed a monster chasing you to have it's corpse end up somewhere wierd (not where you killed it) this is probably what occured. (I forgot to mention monster modes in my first article. Forgot, or hadn't noticed them yet. Thanks to carpetwax of New Zealand for his feedback that brought this to my attention.)

So, when you see that flayer on the screen, there's no real guarantee that the flayer is actually in that location. It's just the best information the client has at the time, and it does it's best to keep you up to date. The client and server cut you a lot of slack, though. Most of the attack abilities in Diablo II have a remarkable amount of auto-targeting. Fiddle around with a +3 teeth wand on a necromancer. The teeth aim at monsters, even monsters you arn't clicking on, if they are on the front firing arc. Not only that, but if you are doing direct fire, clicking directly on a monster, the server seems to do you the favor of firing at where the server thinks the monster is and not where the client thinks the monster is. (Aside. A wall of the eyeless and a +3 teeth wand make for a fun first 10 levels of a necro's life, as long as you are in a small game. Teeth costs 3 mana, but the WotE gives back 5 per kill. Free mana if you're killing things, and you can kill four things with two teeth casts.)

Nice. But it's too bad you can't take a red pill and do away with the illusion. That's life on the realms.

The Connection

Exactly how the server and client connect is pretty important. My own observations led me to belive that Diablo II is using UDP. Email from Xevioux and Psychohist led me to actually look at the network traffic. I was wrong about the UDP - I just got done running D2 under a packet sniffer and it's TCP all the way, both ways. Transmission Control Protocol is what TCP stands for, and it's main advantages are that it makes sure that all of your messages get to the server in the right order. It's disadvantage is . . . all together now . . . it's slower than UDP. I was sure, however, before checking, of TCP being used for half of the connection (server -> client), because of the time compression.

What's time compression? Well, in the course of my job at Rovion my boss let me in on an interesting tidbit. Often, shows on TV and on the radio are time compressed. The recording is played slightly slower or faster, like on a very slow fast foward or "slow" foward. This is done to get more advertising space or make the show fit a time block, and it's very hard to notice.

Diablo II does something like time compression. If, because of lag, the client only has a few frames to show you, then of course things look jerky. However, the reverse is true. If, because a bunch of data gets delayed, the client has three seconds of action to show and only one second to show it in, it compresses the time together and shows them really, really quickly. Ever notice that sometimes after a lag burst, you suddenly see yourself doing things at light speed for a second? That's time compression, the client showing you everything that has happened that it just found about, to catch up to the present. Usually, you just run around like mad, because it was lagged, you couldn't see what was going on, and you just wanted to get away.

TCP keeps all of the incoming transmissions in order. This kind of time compression would be very, very hard to do under UDP, so TCP it is.

Lag : The Illusion Breaks

All of these client optimizations that actually make the game possible break down when some part of the trinity (client, network, server) doesn't do it's job right, and confuse us to death (literally) instead of helping. Monsters zip around, die in strange places, and get off a couple volleys of missles without us seeing them.

If you have a slow client, then you might not manage to load graphics and sound as fast as the server needs you to. This is called hard drive lag. In the first version of this document, I ranted on for a couple paragraphs about how the loading of Duriel's lair wias a deathtrap. Duriel could kill you easily while your client is still loading the graphics. I finished with this quote: "They could have preloaded the Duriel level when you slot the Horadric staff. They didn't."

This is from the 1.04 patch announcement, under Improvements:

- Duriel's graphics are pre-loaded before entering his lair. This minimizes the chance of getting killed by him before loading is complete.

I rule. Thanks, Blizzard! Back to lag, specifically, load lag. All of the graphics for Diablo 2 are stored on your hard drive. Before they get displayed, however, they have to move from your hard drive to your system memory, and from your system memory to your graphics card. During this load time, the server is still going forward, cheerfully attacking your character.

You see load lag in action whenever you fire your first frozen orb of the game. Ever notice that said first frozen orb is jerky? That's because that first time, the graphics involved have to go from the hard drive to memory before they get displayed.

There are a few ways you can clear up load lag. Some of them are even free. First, Windows, by default has a variable size swap file. This is good for some reasons, but it is inefficient. When Windows changes the size of the swap file, your computer slows down a bit. You might want to set the swap file to a specific size. Doing this is really more than I want to go into myself, but a web search turned up an article by a friendly guy named Frank Suszka which seems to cover the issue pretty well. Go take a gander and give it some thought. Make sure to defragment your disk (start -> programs -> accessories -> system tools -> drive defragmenter) if you do change the size, however.

Two and Three are, of course, "get more memory" and "get a better graphics card". More memory is never bad, but you can go overkill on the graphics card. Oh, and don't neglect your sound card. It can slow you down too. A warning of this - your sound skips when lag occurs. (I don't know what's in the machine that I used to play D2, sound card wise, but it's not top of the line, and sound lag is probably my biggest load lag problem.) Of course, you can always turn off music or even sound if that's what holding you back. I like the sound effects, but if I were still playing hardcore my computer would be as silent as a lamb when I played.

Sometimes, the network doesn't do it's job. Communications get delayed or lost. Messages show up too late to have any relevence to the scene at hand (like the "open town portal" message reaching the server a second after you die). If you're playing some FPS game and see "You have been kicked for high ping", that's network lag. Probably.

If you want to fix this, then you need a better connection. If you can't upgrade to cable or better, however, then you might try looking at a different ISP. Remember the traceroute above? You can't change the exodus parts of the link, or your local link. However, your ISP pays for thier link to Exodus. (In my case, through Alter.net. See above.) If they are skimping on thier network service and not paying enough for bandwidth, (passing the savings down, er, I mean up, to you, well, not you, but themselves.) then it might not be your 56K modem causing the problem. Talk to other modem users, find out what ISP they use. If your ISP can only get you 30K a second, then 26K of your already small link might be going to waste.

And, lastly, sometimes the server botches. It crashes, or drops your game, or has too many games to run to keep up the illusion with all of them.

Ramifications

One of the most important optimizations is the movement of your character. Your client doesn't know exactly where your character is, on the server, but it lets you run all you like, and it tries to get your position on the server and the moves you've told the client sychronized. This is critical to the smooth movement of your character, and if the client didn't let you move like this, then your character's position would be jerky enough to give you motion sickeness, becuase you'd need an update for every foot you moved.

We can't do without this. But it causes two of the most prominent breaks in the illusion.

The first is called snap-lag (or jumpback lag). This is where you think you've run to one place and suddenly the server puts you in another. What's happening here is probably this: The network lost some of your movement commands. Thus, when the client is trying to tell the server, "I ran down the hall, through a door, took a left, and went up the other hall.", if the server misses the "through the door" part, then it doesn't understand how you got up the other hall. From the server's perspective, if you didn't open the door, then you can't possibly go through it, and it's not going to let you have a free teleport, so it puts you where it thinks you are and lets you deal with it. This involves you being "snapped" back into place. You might be able to avoid this, to some degree, by cutting corners as wide as possible, although this helps little with doorways.

Brief aside here. People who design client-server games have gotten burned over and over across the years for trusting the client too much. You see, when you trust the client, then someone designs a malicious client and screws you. In the above example, if the server was willing to take the client's word for it : "I don't know how you got there but you say you're there so you must be there", then somebody would hack a client that used this principle to get free teleporting. The UO team got burned on this, too. They tried to lower lag for the good of the players by moving as much functionality as possible out to the client, and thus reduce lag through reducing bandwidth. One of the things they did was delegate the concept of invisibility to the client. So the server said, "This guy's invisible, so, client, don't show him unless you've got enough spot hidden to see him, ok?" It took less than a week for somebody to hack into UO Extreme code to unfairly reveal all invisible players. So the UO took out the client-side invisibility handling, thus boosting bandwidth usage and increasing lag. Lesson learned by developers is, "you can't trust the customer at all", and it's been the player's loss.

Back to Diablo. The other major movement-based illusion-breaker is called "the black wall of death". If the server crashes, then the client just lets you keep on going untill it is sure the server is toast. This includes moving. So the monsters stand still, but you can move around. If you run far enough in one direction, you will encounter "The black wall of death", where you hit the edge of the world. There will be a black nothing with nothing beyond it, that you can't walk into. What's probably happening here, is that you've hit the edge of the world that the client loaded before the server crashed. (I had major Never-Ending Story flashbacks when I first had this happen to me.)

Oddly enough, this doesn't often indicate a total server crash or stall. Lots of people have sent me "black wall" stories. The server does often recover from this. In fact, the server may not be completely down. Often, when getting a black wall, I can still see other players moving on the automap. (Artic Wolf reports that you can even get Black Wall offline!)

When it comes down to it, I have no idea what causes Black Wall. It's somehow a response to the client not being able to get data it needs. It's one of those freak bugs that only comes out in stressed situations, which makes it very hard to find and fix, becuase you can not intentionally reproduce it to experiment. Running from the black wall seems to help, but I have had black walls disappear while standing there staring at them.

Desync (added in 1.02)

I've finally experienced some desync, so I can finally write about it. Desync is a total state of confusion between the server and the client. The client is still running, and still shows you as being at a location, but this location has no bearing on where the server actually thinks you are, and the server's is the only one that counts. The client and server are desynchronized - that's where the term "Desync" comes from - and the client isn't marching to the beat of the server's drum anymore.

Desync is intensely frustrating. Nothing you do translates to meaningful action on the server. My guided arrow Amazon couldn't hit things in a crowded Bloody Foothills with guided arrow while desynced. Since GA requires no aim at all, I have to believe that my GA wasn't actually reaching the server. Desync is completely crippling - never try to fight if you are even slightly desynced. You might be swinging at air while the monsters pound you.

Desync and Black Wall go hand and hand - all instances of Black Wall may be instances of Desync, to some degree. Since the client and server are out of link, the client could show you as moving to some area that the server hasn't sent you information on - and won't, since you arn't actually there - thus causing you to see black wall.

In cases of black wall and cases of desync, the first thing you should do is pop a town portal. If that town portal doesn't appear next to you, sometihng is wrong. Frustratingly enough, i've been desynced, managed to go through a town portal to town, gone back through to the Bloody Foothills, and *still* been desynced. All desync should be fixed if you leave the game you're in.

It's a common conception that having way too much fast run is a cause of desync. I can't confirm or deny this. I have a Frenzy Axe barb, who, like all Frenzy barbs, can run faster than a squirrel on fire. I didn't notice lots of desync. Interestingly enough, I have a joke character who can and does summon 60 minions. (He plays alone a lot.) He gets lots of server, client and network lag, but no desync or black wall. My guided arrow/lightning fury amazon gets more desync than any other character I have, and she has little/no fastwalk, and of course only her valk and her merc.

Bonus Hint: send the word "fps" as a message. It'll add some text at the top of your screen showing, among other things, your ping time and frames per second.

I'll finish with a summary:

Lag Type Symptoms Problems Solution
Client Lag Huge pauses accompanied by lots of Hard Drive or CD Drive activity. You die crossing levels or waypoints, because the monsters kill you before your machine even loads the graphics for you to see them. Make sure you're doing the biggest install you can. If you've done a full install and still have this problem, try to get some more system memory, by buying more or running as few other programs as possible. A graphics card with more memory couldn't hurt. There's a program out there called the D2 Accelerator which I *highly* reccommend for client lag, although it can't help any other kind of lag.
Server Lag The world stops, like something out of the Langoliers, or you get a really sudden disconnect but your network is still up. You can't do anything. The server thinks you are doing nothing, and if you get a total disconnect then you'll do it (nothing) for 15 seconds, or 10 in hardcore, then you get dropped. Play on off hours when things are less crowded. If your net connection is really good, and you can make a character for a server in a different time zone and play when everyone in that time zone will be asleep or at work, you should be able to remove server lag and, well, replace it with some Network Lag.
Network Lag Action becomes sporadic and choppy. Monsters move eratticly. You can't know exactly where the monsters are, so it's hard to fight them. Sometimes you get disconnected after a long period of problems.

Get a better net connection. Play when the net is less busy. If neither of these are options, some classes deal a lot better with lag than others. Sorceresses do horribly in lag, because thier low HPs make them really prone to sudden horrible death and they have no minions to help them. The minion summoning of the Necromancer, and to a lesser extent, high level Amazons, is great because minions are immune to lag, and even if you get disconnected they'll defend you to the death. A paladin with thorns and some life absorb items is almost immune to melee damage, which will help you if your thorns is up and you get heavy lag. Barbarians, to my knowledge, are the second worst class in lag - because they have to swing at thier targets, they are almost as helpless as sorcs, but they can take a larger pounding before dying.

During the lag - RUN. You might not be able to see yourself run, and you might not actually move, but if you do, and the server catches on to your running, it might save you for long enough for the lag to die down. Run and drink potions. Do not forget the potions. (I forgot the potions in the first version of this article. Thanks, Sir Pudding.)

Desync Everything seems to be working fine, but nothing you do actually affects the monsters. You can see other people move and talk and kill monsters, perhaps even at normal speeds, but you can't. You open a town portal and don't see it, or worse, it appears far away from your character. Your client and server are hopelessly confused about where you are. This is a temporary, and for most people, rare occurance. I highly suggest you leave the game. I don't know of any realible way to get rid of desync in game, but there shouldn't be a case of desync that leaving and reentering the game won't fix. This is the number one killer of high level hardcore characters, I gather.
Black Wall A big black wall, like the Nothing out of the Neverending Story, replaces a portion of terrain you are nearing. You are trying to go to terrain that your client doesn't have the information on. This could be becuase of any of the above problems.

Two solutions for two problems.

One. Black Wall sometimes happens on it's own, with no other problems. If that's the case, you've hit some trivial once in a year bug that's hard to find and fix. Leave the area/game and return, and it should be gone.

Two. Black Wall can be caused by most of the above problems. If you're desynced, your client can falsely show you as being somewhere you are not, and becuase you're not there, the server hasn't sent you the map data. If you're having network lag, the data might be delayed in getting to your computer. Figure out what the cause of the black wall is, and fix that problem. There's nothing you can do about black wall directly - it's almost always a symptom of another problem.

Best of luck. Down with Morgoth! Down with Diablo! And Baal better be watching his back, I'd imagine.