Man - this blog isn't even a day old and I'm already getting shit about the information overload caused by using sophisticated new Wiki technology. One doesn't like that he can't read it on his Lunix box, another moans that there's too much stuff they can click on and can't I just use a text file?\n\nNo dammit! I like snazzy new gratuitous tech. This blog is going to be mainly about snazzy new gratuitous tech (with the occasional UphillBothWays nostalgia moan added for good measure). And since this is a thing as totally meaningless and self-centered as a blog, I'm going to use it like I stole it. Er... which I did (thanks [[Jeremy|http://www.tiddlywiki.com/]]). It's not like I'm demanding you use MS tech or anything hentai like that.\n\nEr... news? No, I have none of that. Except that I went to see my accountant today and he said I owed The Man five digits. OK, so as an immigrant Brit living in the US seeking asylum from crappy pay and on a "you can stay if you don't cast nasturtiums on Mr. POTUS" visa making money for US companies and spending all my hard-earned cash on other US companies, would it be slightly impolitic to utter the phrase "No taxation without representation?" Yeah, thought not. Even after I get my green card, I have to tell HRH Brenda to shove it before I get to cast a pointless vote in my local United Franchise of America. Gonna book me a plane ticket to Boston, gonna tip a crate of your shitty bourbon into the harbour. Yes, "harbour". It's got a "u" in it. Blow me. Where's my Laphroaig, [[wench|TheWife]]?\n\nTo be fair, it's all because I got a mildly obscene Christmas bonus on account of the SuperSecretProject. You so want my boss. But you can't have him. He's mine. All mine!
Double precision - it's not magic. But people keep invoking it like it is.\n\nFloating-point numbers are brilliant - they have decent precision and a large range. But they still only have 32 bits, so there's still only 4billion different ones (actually there's fewer than that if you ignore all the varieties of NANs, assume infs are basically bugs, and turn off denormals because they're agonisingly slow). So they have tradeoffs and weaknesses just like everything else. You need to know what they are, and what happens for example when you subtract one small number from another - you get imprecision. One "solution" is to turn on double precision, because then you get more bits. Yeah, but you've just shuffled the problem around. Doubles have exactly the same weaknesses - all you've done is shuffle the problems into a corner, stuck your fingers in your ears and yelled "lalalalalala". They'll still come back to bite you, and because it's double precision, it'll be even rarer and even harder to track down. And they're slower on most machines, so you've hurt your execution speed.\n\nThe right thing to do is go read a book on numerical precision. The basics are - any time you do a subtract (or an add, implicitly), consider what happens when the two numbers are very close together, e.g. 1.0000011 - 1.0. The result of this is going to be roughly 0.0000011 of course, but the "roughly" is going to be pretty rough. In general you only get about six-and-a-bit decimal digits of precision from floats (2^23 is 8388608), so the problem is that 1.0000011 isn't very precise - it could be anywhere between 1.0000012 or 1.0000010. So the result of the subtraction is anywhere between 1.2*10^-6 and 1.0*10^-6. That's not very impressive precision, having the second digit be wrong! So you need to refactor your algorithms to fix this.\n\nThe most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation.\n\nBy the way, some rendering code doesn't even do that subtraction. Canonically, you bake that subtraction into the offset of your "camera matrix" in your rendering code and let the GPU do it. This is madness - GPUs have even less precision than full IEEE754 floating-point, and you're doing a whole bunch of vector math before you finally do the subtraction. So your precision problems are going to be a lot worse. Best all round if you just subtract camera position from object position on the CPU before it gets anywhere near a GPU.\n\nThe solution is obvious - if single precision isn't enough, switch to storing your positions in double precision! Well er yeah, that kinda works. Mostly. But there's some problems. The most obvious is that some machines simply don't do doubles (PS2), and on machines that do support doubles, there's usually speed penalties.\n\nSo what's the alternative? Well, good old fixed-point numbers. For storing positions, they're great. You only need to subtract positions from each other, and add velocities to positions. You almost never need to do crazy stuff like multiply two positions together or anything - that doesn't even "mean" anything. 64-bit fixed-point integers are supported on pretty much every platform. Even if they're not natively supported, they're easily emulated with things like add-with-carry instructions (and those platforms will have awful float64 support anyway).\n\nPlus they have the big advantage that you actually get all 64 bits of precision. A float64 only has 52 bits of precision. But I hear you say - it's not constant precision - floating-poiint numbers have adaptive precision. Yes they do - and that's actually a problem. The problem is that your code works differently depending on your distance from an arbitrarily-chosen point in space - the origin.\n\nThis has huge implications for robustness and testability. If you're like me, when you develop code, you do so in a tiny sandbox game level so you don't have to spend five minutes loading the assets for a full level. And yay - everything works fine. The physics works, the gameplay movement works, there's no falling-out-of-maps stuff, everything is smooth. So you check it in and then several weeks later someone says there's some shitty stuff happening in level X. So you test level X and it seems fine, and then you have the "works on my machine" argument with the tester, and it takes you several days to figure out that it only happens in one part of level X, and that's the part where you're a long way from the origin and some part of the physics code is subtracting two large numbers and getting an imprecise result.\n\nThe nice thing about fixed point is it's consistent. You get the same precision everywhere. If your time step and physics epsilons work at the origin, they work everywhere. And before you moan that fixed-point doesn't have the range - 32 bits of fixed-point gets you anywhere on Earth to about 3mm. Not enough? Well 64 bits of precision gets you to the furthest distance of Pluto from the Sun (7.4 billion km) with sub-micrometer precision. And it's all consistent precision - no lumpy parts or fine parts or places where you suddenly get denormals creeping in and your performance goes through the floor for no well-identifiable reason.\n\nI think the easiest way to encapsualte this (if you're an OOP fan) is to make a "position" class that holds your chosen respresentation, and the only operations you can perform on that class are subtracting one from another to give a standard float32 vec3, and adding a float32 vec3 to a position to get another position. So it means to can't really use a position without doing calculations relative to another position. You should find that that is all you actually need. If not, then it's time to refactor some code, because you're likely to have problems when doing things at different dstances from the origin.\n\nThe one thing I have seen is people making BSP trees out of the entire world, and then storing the planes as A,B,C,D where: Ax+By+Cz+D>0. Which is fine, except that's also going to have precision problems.And it's going to have precision problems far earlier, because x,y,z are another object's position. And now you're multiplying two positions together and subtracting a bunch of them and asking if they're positive or negative and the problem with that is that you will halve your available precision. So even doubles will only get you 52/2 = 26 bits of precision, which is rubbish. And experience with BSP trees has shown that they're extremely intolerant of precision errors. So yeah - that sort of stuff just makes me wince. You really do need to store a point on the plane and the plane's normal. Otherwise even decently-size levels are going to end up in agony (Michael Abrash has a story exactly like this about Quake 1 - they had tiny tiny levels and they had problems!)\n\nSo I had a point several decades ago, and that was that I have a nice compact OffendOMatic rule: any time you use doubles in a game, you haven't understood the algorithm.
I've always hated two things about sprintf. The main annoyance is that you have to have the format string, and then after that comes all the arguments. It really doesn't read well, and it's too easy to get the arguments the wrong way round or add or forget one. Related to this is the fact that there's no type-safety, so when you do screw up, it can lead to silent and/or really obscure memory-scribbler bugs.\n\nOne alternative is the cout/iostream style format with a lot of << stuff everywhere. That makes my eyeballs itch with the amount of C++ involved, and it's ugly to use with string destinations.\n\nA third option is smart strings, but the complexity and constant alloc/dealloc behaviour also offends me. I'll be honest - I worry a lot of that magic allocation stuff is just far too complex for mortals to debug or profile. It's far too easy to write a line of simple-looking code and destroy performance.\n\nI was hoping not to reinvent the wheel, and a few people pointed me at [[FastFormat|http://www.fastformat.org]]. It has a mode that seems to sort-of do what I want, but the documentation is a disaster and the scope of the thing is clearly huge and way beyond what I need, so rather than wade through it for two hours, I thought I'd see if I could write something simple in two hours that does what I want. Turns out, the answer is "yes".\n\nSo, the goals are:\n* No dynamic allocation.\n* Easier to read and control than sprintf()\n* Type-safe (in as much as C can ever be type-safe - bloody auto-converts).\n* Overflow-safe (i.e. it will refuse to scribble, and will assert in debug mode).\n\nThe thing I'm happy to give up is extendable-length strings, i.e. I don't mind pre-declaring the max length of any string. That's kinda obvious since I don't want dynamic allocation, but it's worth stating explicitly.\n\nSo the very first thing I wrote almost worked. The first bit was easy - it's this:\n\n{{{\nstruct StaticStr\n{\n unsigned int MaxLen;\n char *pStr;\n\n StaticStr ( char *pStrIn, int MaxLengthIn )\n {\n MaxLen = MaxLengthIn;\n pStr = pStrIn;\n }\n\n ~StaticStr()\n {\n // Nothing!\n }\n\n // Cast to a normal string. It's deliberate that it's not a const (but be careful with it).\n char *Cstr()\n {\n return pStr;\n }\n\n // Assignment.\n StaticStr & operator= ( const StaticStr &other )\n {\n ASSERT ( strlen ( other.pStr ) < this->MaxLen );\n strncpy ( this->pStr, other.pStr, this->MaxLen );\n this->pStr[this->MaxLen-1] = '\s0';\n return *this;\n }\n\n // Assignment from a C string.\n StaticStr & operator= ( const char *pOther )\n {\n ASSERT ( strlen ( pOther ) < this->MaxLen );\n strncpy ( this->pStr, pOther, this->MaxLen );\n this->pStr[this->MaxLen-1] = '\s0';\n return *this;\n }\n\n StaticStr & operator+= ( const StaticStr &other )\n {\n int ThisLength = strlen ( this->pStr );\n ASSERT ( ( ThisLength + strlen ( other.pStr ) ) < this->MaxLen );\n strncat ( this->pStr, other.pStr, this->MaxLen - ThisLength - 1 );\n return *this;\n }\n\n // This is actualy an append - it's really += rather than + but the typing gets tedious.\n StaticStr & operator+ ( const StaticStr &other )\n {\n return *this += other;\n }\n\n // Append of a C string.\n StaticStr & operator+= ( const char *pOther )\n {\n int ThisLength = strlen ( this->pStr );\n ASSERT ( ( ThisLength + strlen ( pOther ) ) < this->MaxLen );\n strncat ( this->pStr, pOther, this->MaxLen - ThisLength - 1 );\n return *this;\n }\n\n StaticStr & operator+ ( const char *pOther )\n {\n return *this += pOther;\n }\n};\n\n// Slightly ugly...\n// Used like:\n// StaticStrDecl ( sstemp1, 1024 );\n#define StaticStrDecl(name,length) char StaticStringCalled ## name[length]; StaticStr name ( StaticStringCalled ## name, length )\n// char *pTemp[1024];\n// StaticStrWrap ( sstemp1, pTemp );\n#define StaticStrWrap(name,strname) StaticStr name ( strname, sizeof(strname)/sizeof(strname[0]) )\n}}}\n\nAnd that works fine for sticking strings together, so you can do code like this:\n\n{{{\n StaticStrDecl(string1, 1024);\n char TempString[100];\n StaticStrWrap(string2, TempString);\n string1 = "Hello";\n string2 = " world";\n string1 += string2;\n string1 += " again";\n}}}\n\n...or more obviously usefully, code like this:\n\n{{{\n string2 = " world";\n (string1 = "Hello") + string2 + " again";\n}}}\n\nNote the annoying braces. If you just do this:\n\n{{{\n string1 = "Hello" + string2 + " again";\n}}}\n\n...the way you'd like to, it fails to compile because there's no + operator that takes a left hand of a char*, and it doesn't figure out that a different precedence would fix things. It's a pretty minor annoyance. I'm sure somebody with more C++ operator overloading experience can tell me how to fix it, but honestly I'm not sure I want to know.\n\nTechnically I shouldn't have allowed the + operator, because it's an append, and you should have to write this:\n\n{{{\n (string1 = "Hello") += string2 += " again";\n}}}\n\n...but I hate the typing. Part of the impetus is to make things easier to read, not harder. Also, for reasons I really don't want to pollute my brain with ''(so don't email me)'' the operator precedence changes, so you actually get:\n\n{{{\n (string1 = "Hello") += (string2 += " again");\n}}}\n\n...which means although string1 is correct, string2 now contains " world again", which is not really want I wanted. Of course you can still get strange things happening in some cases if you forget the braces:\n\n{{{\n string2 = "Hello";\n string1 = string2 + " world"; // I forgot the braces, but this still compiles & runs.\n}}}\n\nNow both of them contain "Hello world". Maybe with some sort of gratuitous const gymnastics I could fix it, but right now I'm going to ignore it and hope it doesn't bite me in the arse later. Yeah, right. Ugh.\n\nAnyway, so what about actual number printing? Well, my first attempt was to use some temp strings and some functions to wrap sprintf, such as\n\n{{{\n StaticStr & Int ( int i )\n {\n int written = _snprintf ( pStr, MaxLen, "%i", i );\n ASSERT ( ( written >= 0 ) && ( (unsigned)written < MaxLen ) );\n return *this;\n }\n}}}\n\nThe way you use it is like this:\n\n{{{\n (string1 = "Hello ") + string2.Int(1) + " world";\n}}}\n\n...and this works. string2 is "1" and string1 is "Hello 1 world". The problem comes when you want to print multiple numbers using the same temp:\n\n{{{\n (string1 = "Hello ") + string2.Int(1) + " world " + string2.Int(2);\n}}}\n\nThe problem is the two Int() functions get called before any concatenation happens, so what you get is "Hello 1 world 1". Why they're called last first is also a mystery - I was at least expecting to get "Hello 2 world 2". It's probably poorly defined anyway. Note that this isn't an operator precedence problem - adding lots of braces doesn't fix the problem. Nuts.\n\nI had a cup of tea and a sit-down and a think and then an alternative came to me. It turns out it's much nicer to type, completely avoids the temporary problem, and is faster (not that speed is exactly a priority, especially when printing floats, but it's never a bad thing).\n\nFirst step, add a seemingly pointless helper class:\n\n{{{\nstruct StStInt\n{\n int mValue;\n StStInt ( int Value )\n {\n mValue = Value;\n }\n};\n}}}\n\nThat's it - that's all it does - no other methods. There's also ~StStFloat. "~StSt" is just an abbreviation for "~StaticStr." You'll see why I want to shorten it in a bit. Then I add this method to ~StaticStr:\n\n{{{\n StaticStr & operator+ ( const StStInt &Other )\n {\n int StrLen = strlen ( pStr );\n int written = _snprintf ( pStr+StrLen, MaxLen-StrLen, "%i", Other.mValue );\n ASSERT ( ( written >= 0 ) && ( (unsigned)written < MaxLen ) );\n pStr[MaxLen-1] = '\s0';\n return *this;\n }\n}}}\n\nMost of the complexity here is dealing with the completely bonkers behaviour of _snprintf and various overflow checking - the actual conversion stuff is simple. Now you can write fairly elegant stuff like:\n\n{{{\n string2 = " world ";\n (string1 = "Hello ") + StStInt(1) + string2 + StStInt(2);\n}}}\n\nYou still need the extra braces, annoyingly, but it works just fine. There's no temporaries either - the values are just _snprintf'ed right onto the end of the existing string. The fact that string2 doesn't get modified is nice as well, though I worry that might be undefined behaviour, so maybe don't push it.\n\nThe next step was to handle formatting, because sometimes you do want it. The routines get only slightly more complex:\n\n{{{\nstruct StStFloat\n{\n float mValue;\n int mPrecision;\n int mMinWidth;\n StStFloat ( float Value, int Precision = -1, int MinWidth = -1 )\n {\n mValue = Value;\n mPrecision = Precision;\n mMinWidth = MinWidth;\n }\n};\n}}}\n...and...\n{{{\n StaticStr & operator+ ( const StStFloat &Other )\n {\n int StrLen = strlen ( pStr );\n int written = -1;\n if ( Other.mMinWidth < 0 )\n {\n if ( Other.mPrecision < 0 )\n {\n written = _snprintf ( pStr+StrLen, MaxLen-StrLen, "%f", Other.mValue );\n }\n else\n {\n written = _snprintf ( pStr+StrLen, MaxLen-StrLen, "%.*f", Other.mPrecision, Other.mValue );\n }\n }\n else\n {\n if ( Other.mPrecision < 0 )\n {\n written = _snprintf ( pStr+StrLen, MaxLen-StrLen, "%*f", Other.mMinWidth, Other.mValue );\n }\n else\n {\n written = _snprintf ( pStr+StrLen, MaxLen-StrLen, "%*.*f", Other.mMinWidth, Other.mPrecision, Other.mValue );\n }\n }\n pStr[MaxLen-1] = '\s0';\n ASSERT ( ( written >= 0 ) && ( (unsigned)written < MaxLen ) );\n return *this;\n }\n}}}\n\nAnd that means you can do elegant things like:\n\n{{{\n (string1 = "Hello ") + StStFloat(1.0f) + ":" + StStFloat(2.0f, 2) + ":" + StStFloat(3.0f, -1, 15) + ":" + StStFloat(4.0f, 3, 10);\n}}}\n\nWhich produces string1="Hello 1.000000:2.00: 3.000000: 4.000". Don't ask me why the default precision for _snprintf likes so many decimal points.\n\nAnyway, so I got my wish - I got a zero-dynamic-allocation, typesafe, format-free and fairly readable version of sprintf. Happy coder joy joy!\n\nYour mission, should you choose to accept it, is to go out and find the wheel that I just reinvented, thus saving me the hassle and embarrassment of using this library evermore. This blog will self-destruct when I get around to it.\n
Someone prodded me about this the other day, so I thought I should get on and do it. I gave a GDC 2003 talk about SH, but I was never really happy with it - half an hour isn't really enough to cover it well. I haven't changed the slides, but now I have some notes for it. I corrected an error, finally wrote those elusive fConstX values down, but mainly I talk about some surrounding details about the console implementation I actually used it in - what is probably the coolest bit of all - or at least it's the bit the artists really liked about the new system. [[Spherical Harmonics in Actual Games notes]]
The rest have been removed from the front page to save space. Here are all my blog entries in chronological order:\n\n[[Sparse-world storage formats]]\n[[Matrix maths and names]]\n[[New Job]]\n[[A sprintf that isn't as ugly]]\n[[Saving, loading, replaying and debugging]]\n[[StarTopia love]]\n[[Logging, asserts and unit tests]]\n[[Data Oriented Luddites]]\n[[Moore's Law vs Duck Typing]]\n[[Platform-agnostic types]]\n[[Texture streaming systems with two levels of cache]]\n[[Visibility and priority tests for streaming assets]]\n[[Squares or hexes]]\n[[Dwarf Minecart Tycoon]]\n[[Larrabee talk roundup and media attention]]\n[[GDC 09]]\n[[How to walk better]]\n[[Larrabee ISA unveiled at GDC 2009]]\n[[CAs in cloud formation]]\n[[Regular mesh vertex cache ordering]]\n[[Siggraph Asia 2008]]\n[[Plague]]\n[[Siggraph 2008]]\n[[ShaderX2 available for free]]\n[[Larrabee and raytracing]]\n[[Renderstate change costs]]\n[[Larrabee decloak]]\n[[Blog linkage]]\n[[Smart coder priorities]]\n[[Rasteriser vs Raytracer]]\n[[Texture formats for faster compression]]\n[[Knowing which mipmap levels are needed]]\n[[SSE]]\n[[Patently evil]]\n[[Trilights]]\n[[Bitangents]]\n[[More vertex cache optimisation]]\n[[Reinventing the wheel - again!]]\n[[Shadowbuffers and shaders]]\n[[Utah Teapot]]\n[[GDC survival guide]]\n[[Pixomatic slides]]\n[[Notepad++]]\n[[More on SH]]\n[[Licenses]]\n[[Added some hindsights and notes on Spherical Harmonics]]\n[[Cellular Automata article added]]\n[[Impostor article added]]\n[[Shadowmap vs shadowbuffer]]\n[[Vertex Cache Optimisation]]\n[[Strippers]]\n[[Scene Graphs - just say no]]\n[[Dodgy demos]]\n[[Premultiplied alpha]]\n[[Babbage was a true genius]]\n[[Someone cited my VIPM article]]\n[[VIPM article in HTML]]\n[[RSS part 3]]\n[[VGA was good enough for my grandfather]]\n[[RSS part 2]]\n[[A matter of precision]]\n[[Game Middleware list]]\n[[RSS banditry]]\n[[...there was a load of words]]\n[[In the beginning...]]
A set of articles set up by Mike Action and friends: http://altdevblogaday.com/
I've just been reading up on the Analytical Engine. I've known the background to Babbage's life and works for ages, and I knew he was incredibly clever - the Difference Engine is an amazing feat when you consider it's made of mechanical gears and powered by a human cranking on a lever. But fundamentally all it does is a bunch of parallel additions. For the time, it would have done them extremely fast and have had awesome reliability, but the actual computations were perfectly well-understood. By the way, if you're ever in London, you have to go visit the Science Museum in South Kensington and see the [[Difference Engine Mk2|http://en.wikipedia.org/wiki/Difference_engine]] that they built. It's truly fascinating watching it work, and work well. The thing I didn't realise until I read a bit more about it is that it's actually built to achievable 19th-century tolerances, which means Babbage actually could have built the whole thing, and it would have worked convincingly and usefully. His problems were political and financial rather than mechanical, and it didn't help that he was a pompous jerk (like many geniuses).\n\nBut again, perfectly well-understood maths in the Difference Engine. Couldn't do anything revolutionary, just would have done it far better than the existing mechanisms (a room full of people adding stuff manually). No, the real genius came with the Analytical Engine. Again, I've always known it was the first programmable computer, but when people say that - you always imagine that well yes, it was slightly smarter than a [[Jacquard Loom|http://en.wikipedia.org/wiki/Jacquard_loom]], and maybe if you squinted a bit and jumped through many flaming hoops you might see it was getting close to Turing-capable, and if you examined manuals a lot and were cunning you could get it to do something kinda like a proper algorithm. Certainly when I've looked at the functionality of some of the 1940s and 50s computers, that's what they always looked like.\n\nNo, that's not the Analytical Engine at all. It's not a bag of bolts that some mathematician can show in 200 pages of jargon can be made to be Turing-complete. It's much better than that - it's basically a 6502 with bells on (literally).\n\n[[Here are lots of details|http://www.fourmilab.ch/babbage/cards.html]] (including an emulator!), but basically it has a fairly straightforwards machine language - it does addition, subtraction, multiplication and division. It has two input registers and an output register. You can do loads from a store (memory) with 1000 entries, to the input registers, and when you load the second register, it does the operation you ask, and puts the result in the output register, which you can move to the store. You can also do shifts left and right to move the decimal point around in fixed-point maths. If the result of a computation has a different sign to the first input, or overflows, it makes a "run-up lever" move upwards. One could imagine tying a small bit of cloth to this lever, and then one might term this "raising a flag". You can issue commands that say to move backwards or forwards by a set number of commands, and you can tell it to do this all the time, or only if the run-up level is raised. Hey - unconditional and conditional branches.\n\nI mean that's it - it's so absurdly simple. It has load, store, the basic six arithmetic operations, and conditional branches. Right there in front of you. It's a processor. You could code on it. Look, here's some code (with my comments):\n\n{{{\nN0 6 ;; preload memory location 0 with the number 6\nN1 1 ;; preload memory location 1 with the number 1\nN2 1 ;; preload memory location 2 with the number 1\nx ;; set the operation to multiply\nL1 ;; load memory location 1 into the ALU\nL0 ;; load memory location 0 into the ALU - second load causes the multiply operation to happen\nS1 ;; store the result in location 1.\n- ;; set the operation to subtract\nL0 ;; load location 0\nL2 ;; load location 2 - causes subtraction to happen.\nS0 ;; store to location 0\nL2 ;; load location 2\nL0 ;; load location 0 - causes subtraction to happen.\n ;; If the result sign is different to the first argument, the run-up-lever is raised\nCB?11 ;; if the run-up-lever is raised, move back 11 instructions.\n ;; Like today's CPUs, the location of the "instruction pointer" has already moved past this instruction,\n ;; so back 11 means the next instruction is the "x" above.\n}}}\n\nThe result of this function is in location 1. Notice that location 2 never gets stored to - it's the constant value 1. Still having a bit of trouble - let me translate it to C:\n\n{{{\nint mystery_function ( int loc0 )\n{\n int loc1 = 1;\n const int loc2 = 1;\nkeep_looping:\n loc1 = loc1 * loc0;\n loc0 = loc0 - loc2;\n if ( sgn(loc2 - loc0) != sgn(loc2) )\n {\n goto keep_looping;\n }\n return loc1;\n}\n}}}\n\n...and now change "loc2" to be "1" and massage the "if" conditional appropriately:\n\n{{{\nint mystery_function ( int loc0 )\n{\n int loc1 = 1;\nkeep_looping:\n loc1 *= loc0;\n --loc0;\n if ( loc0 > 1 )\n {\n goto keep_looping;\n }\n return loc1;\n}\n}}}\n\nYou can't still be having trouble. It's a factorial function! Apart from the slight wackiness of the loop counter, which is an idiom just like any other language, it's all absurdly simple and powerful. The only big thing he was missing from the ALU was direct boolean logic for doing complex control logic. But it's not a huge lack - first, it can all be emulated without too much pain with decimal logic (OR is an add, AND is a multiply, etc), and secondly after about five minutes playing with the thing he would have realised that he needed something like that, and the mechanicals for boolean logic are trivial compared to decimal multiplication and division.\n\nJust to gild the lilly, Babbage wanted to attach a plotter to it. You could send the most recent computed result to the X-coordinate or Y-coordinate of the plotter, and you could raise or lower the pen. He could have coded up Tron light-cycles on the damn thing.\n\nThe one thing he missed (and to be fair so did half the computer pioneers of the 1940s) is the ability to compute addresses, i.e. instead of always looking up the data in location N where N is punched into the card itself, the ability to calculate N with maths. However, he did think of precomputed lookup-tables, which is one form of this, but only loading from it, not storing to it. Amusingly, he then decided that actually, since the table - e.g. of logarithms - had been generated by the AE in an earlier run, and that doing the lookup involved the AE displaying a number and then ringing a bell and getting a man to go rummage through a stack of punched-cards for the right one, which would take time and be error-prone (he also had a primitive error-checking system!), it might just be better if the machine recomputed the desired value there and then, rather than using a LUT at all. Which is almost exactly where we are in computing now, where LUTs are frequently slower and less accurate than just re-doing the computation!\n\nAnd he did all of this in a total vacuum. He had nothing but paper and pen and brass and plans and his brains, and he invented a frighteningly useful, powerful and easy-to-program processor. Including comments and debugging support. They didn't even have a machine as powerful as a cash-register at the time - nothing! So when I say Babbage was a genius - he's one of the greats. If he'd succeeded even slightly, if the Difference Engine had got half-finished and produced useful results and people had taken him seriously - all that Turing and von Neumann stuff that we think of as the birth of modern computing - obsolete before they even thought of it, because Babbage got there nearly a century before.\n\nPut it like this - if he had been born almost exactly 100 years //later// and instead of brass and steam had worked in relays and electrickery in the 1950s, he would have made almost exactly the same breakthroughs and probably built almost exactly the same machine - it just would have been a bit faster (but less reliable - bloody valves!). It's almost shameful that in those 100 years, nobody advanced the art of computing in the slightest. And yet think of all the things that have happened in the 50 years since then. I read Sterling and Gibson's book "The Difference Engine" ages ago, and thought it was all a bit of a flight of fancy - pushing the probability curve to breaking-point - the way SF authors should. Now - I'm not sure it was at all fanciful. Imagine if we were living in a world that had had computers for three times as long. Gordon Moore's famous law would have gone totally ballistic by now. Microsoft's motto would have been "a computer in every can of coke", because we'd have had a computer on every desk since about 1910, and the big thing that helped the Allies beat the Nazis wouldn't have been radar, it would have been the Web.\n\nThere's just one bizarre thing I can't figure out. Babbage initially specified the Analytical Engine to 40 //decimal// digits. Later, he upped it to 50. 50 decimal digits is about 166 bits. That's gigantic, even for fixed-point arithmetic. These days we find 32 bits just fine (9-and-a-bit decimal digits) - 64 in some specialised places (19 digits). But 166 bits is nuts - that is collossal over-engineering. And it's not because he didn't understand the maths - it's extremely clear from his writings that he understood fixed-point math perfectly well - shifting stuff up and down after multiplication and division, etc. In two separate places he explained how to do arbitrary-precision arithmetic using the "run-up lever" (i.e. the AE version of "add-with-carry") in case people needed 100 or 200 digit numbers. To put this in perspective, the universe is at least 156 billion light-years wide - that's 1.5 x 10^27 meters. A single proton is 10^-15 meters across. So the universe is roughly 1.5 x 10^42 protons in size - which only takes 43 decimal digits to represent. Babbage decided that wasn't enough - he needed the full 50 digits. Also, he specified a memory of 1000 entries of these 50-digit numbers, and that didn't include the code, just data - that's 20kbytes of random-access data. For a smart guy whose major problems were finding good enough large-scale manufacturing technologies and finding the cash to build his gigantic room-sized engine, that seems pretty dumb. If he'd built a 3-digit AE, his instruction set would have exceeded the computing capabilities of every 8-bit machine in the home microcomputer revoloution of the 1980s. 5 digits and he'd have beaten the 16-bit PDP11, ST, Amiga and 8086 for per-instruction computing power. If he'd only aimed a little lower, he could almost have built the thing in his own workshop (his son constructed a three-digit Difference Engine from parts found in his house after his death!), instead of spending half his time arguing with his fabrication engineers and begging parliament and the Royal Society for ever-increasing sums of money. What a waste. But still - a genius of the finest calibre.\n\nMy theory is, Babbage was actually given a Speak-and-Spell by Dr. Who. Or maybe Sarah Connors. It's the only rational explanation.
I heard an eye-opening rant by Chris Hecker the other day.\n\nNo, not //that// rant - I already knew the Wii was one and a half GameCubes stuck together with duct tape. No, the one about bitangents.\n\n"The what?" I hear you ask. "You mean binormals?" As it turns out - no. The rule about what "bi" means is that some objects have an arbitrary set of properties, and you need two of them. It doesn't matter which two - you just need them to be two different examples. So a curve in 3D space has not just one normal, but loads of normals - a whole plane of them in fact. But you only need two of them - pretty much any two. You call one the "normal" and you call the other the "binormal". The standard classical-language hijack of "biX" just means "two X" or "the second X". And that's where binormal comes from - curves. Note that a curve only has a single tangent vector (well, OK, it has two, but one is the negative of the other).\n\nOK, so now on surfaces, you have a single normal. There really is only one normal. But there's loads of tangents - an entire plane of them - the plane of the surface. And so you need to pick two of them (typically if the surface is textured we're concerned with the lines of constant V and the lines of constant U). So one is called the tangent, and logically the other one should be called the "bitangent". Yes, you heard me - not "binormal".\n\nAnd that's the rant. When doing bumpmapping on a surface, you should talk about the normal, tangent and //bitangent//. Nobody's quite sure why people cottoned on to the word "binormal", and I've certainly spent the last ten years talking about binormals, but you know what - it's still wrong. From now on, I will speak only of bitangents (though that's going to cause chaos with Granny3D's nice backwards-compatible file format).\n\nHere endeth the lesson.
The game of the film, made by [[MuckyFoot]]. Except it had almost nothing to do with the film because we were making it at the same time as the film, didn't know much about their plot, so we just used the same characters in a different story. It had some neat features. The 360-degree fighting was kinda nifty - it used dual-stick "Robotron melee" controls - left one moved Blade, right one would attack in the direction pushed, and the exact attacks used were all down to timing and which weapon you had out.\n\nIt came out on Xbox and ~PS2, and I wrote the Xbox rendering engine and all the other platform-specific stuff (sound, controllers, save/load, etc). From the start we decided the two platforms were too different to share much code, and we were mostly proved right, though in the end we did share more than we thought - mainly the streaming code and some of the lighting stuff. Originally the ~PS2 wasn't going to try to stream data, because we assumed the DVD seek times would kill us, but it worked much better than we thought, and the extra memory really helped. I'm fairly proud of the graphics engine - got a lot of good tech in it such as streaming almost all rendering resources, some neat code to deal with multiple dynamic light sources, compile-on-demand shaders, and some really good VIPM stuff. Other Xbox-only features were some nice self-shadowing with my first experiments with shadowbuffers, a cool-looking cloth simulation for Blade's cloak.\n\nIncidentally, the lighting was a huge pain. You see, Blade is this black guy who wears black leather and black shades, and he hunts vampires, so we're not going to have any bright sunlit levels - we're pretty much always in sewers at night. Result - he's basically invisible. It's fine to show him as a lurking shadow for a film, but in a game you kinda have to be able to see what's going on. Pretty much the only reason you could see him at all was because we cranked up the specular highlights like crazy, and made everything else in the scene fairly bright colours, so at least he's silhouetted.\n\nDespite being happy with it on a technical level, it was still a rather rushed game, and had some pretty rough edges. Some of the levels were rather dull - it would have been better if it had been a shorter game, but back then it wasn't acceptable to ship anything less than 20 hours of play, even if a lot of that play was a bit dull. C'est la vie.
If you're going to add this blog to your blogroll, and thanks very much for doing so, please use http://www.eelpi.gotdns.org/blog.wiki.html, and not the address you actually see in your browser - that one will change if/when I move ~ISPs. Thanks.
Pascal Sahuc emailed me a link to a neat paper on cloud simulation with a cheap CA. [["A Simple, Efficient Method for Realistic Animation of Clouds"|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.4850]] by Dobashi and Kaneda. In fact it's absurdly cheap - each cell has only three bits, and it's all boolean tests. I'm amazed that such wonderful results come out of such simple rules and three bits of state. They do smooth the field before rendering, but it still [[looks really good|http://video.google.com/videoplay?docid=3133893853954925559]].
Another article got converted to HTML, and some hindsights. I wish I'd found a fun game to put the code in - it seemed to work really well, but I couldn't think of anything really cool to do with it. [[Papers and articles section|http://www.eelpi.gotdns.org/papers/papers.html]]
My hobby game [[Dwarf Minecart Tycoon]].
I went over some of the fundamentals behind the Data Oriented Design movement in [[Moore's Law vs Duck Typing]]. I'm not going to actually cover what DOD is because I honestly don't (yet) know, and I'm not sure even the main proponents do either. We understand the problem, and we have some examples of solving specific cases, but I'm not sure there's yet a good framework of rules or (wouldn't it be nice) set of libraries to make sure you don't have the problem in the first place while also keeping code readable and flexible. I've also not been paying //that// much attention since I'm no longer a full-time games coder - although I still write plenty of code, the complex stuff doesn't have to be super fast, and the stuff that has to be fast isn't very complex. So I'll leave it to others to explain DOD since I'll just screw it up.\n\nBut what I do want to make is a meta-comment. The DOD folk are widely seen as a bunch of ninja curmudgeon bit-heads. Although I think they'd probably be quite proud of that label, the image it conveys is of folks who can hand-tune any assembly routine known to man but loathe anything higher-level than C99. They can argue about set associativity, DTLB walking and tell you how many clock cycles it is to main memory. Which is all very clever, but if //you// don't know that stuff they think you're Not A Well Rounded Programmer, and if you use the word "virtual" they'll commence much wailing and gnashing of teeth. In the other corner we have hipsters coming out of university courses with heads full of Java and Python and looking blankly at you when you talk about bitfield manipulation and the justifiable use of "goto". When you say "compile" they think you mean "tokenize", and they regard exception-handling as part of standard control flow rather than an admission of failure in code design.\n\nSo unfortunately, the DOD message gets conflated with the other two big language flame-wars that have been going on for a while. So let me clearly enumerate them all:\n\n1. Object Oriented Programming is not the answer to everything.\n2. C++ is a terrible implementation of an OOP language.\n3. Using lots of indirections can trash your caches, destroy performance, and you'll never be able to find it again.\n\n.#3 is the big new DOD thing we're all ranting about.\n\n.#2 is really difficult to argue too hard against. There's so many examples of better languages out there, and so many examples of dumb bugs caused by C++ implementation decisions. And amusingly it's one thing the hipsters and the 8-bitters can completely agree on. But it's also difficult to do anything about it. It's a practical reality that nobody is going to switch to another language any time soon, and the current leading contenders (Java, Python) smack into problem #3 pretty hard. Of course we should all switch to LISP - let me know when that happens and I'll join you dancing around the bonfires of our editions of K&R, but until then I need it by my side to debug some muppet's lack of parentheses or to look up for the zillionth time how to declare a function returning a pointer to an array of const function pointers. I put bitching about C++ in the same bucket as complaining about the frailties of the human spine and why Macs only have one mouse button - there's nothing you can do about it, so just shut up and deal with it.\n\n.#1 is really the only interesting discussion. It's certainly not new and the DOD folk didn't start it, but for most games coders the DOD context may be the first place they've seen people being quite so brazenly ~OOP-phobic. Even then it's not //that// interesting a discussion, because the answer is clearly not "OOP has no uses", nor is it "OOP is the answer to everything", but somewhere in the middle, so it's the position of that line that matters.\n\nI happen to think that OOP is overused. In particular I think OOP is used when the thing you want is not //objects// but //abstractions//. Unfortunately the major way C++ knows how to do abstractions is with objects. Thus, any time you want an abstraction (e.g. a cache of previous results) the path you end up getting pushed down by C++ is OOP ("make an object that represents the cache, and now make it inherit from..." aaaarg''gghhh!''), and so what we have is a bunch of ~OOPy shitty unreadable code. But it's shitty because it's ~OOPy and it shouldn't be, not because OOP is inherently a bad design paradigm (when it's appropriate).\n\nThe other thing I see is people thinking DOD and OOP are opposite ends of the spectrum. This is C++'s fault again, though to be fair a bunch of other ~OOPy languages screw it up as well. In C++ when you have an object (a real game object that is - like a tank or a person) with a bunch of member values such as (position, orientation, health, ammo), it puts all those values contiguous in memory. But DOD says that the rendering only cares about the first two items, so why are you polluting your caches with the other two when rendering? You should have two separate arrays (pos+orient) and (health+ammo) and have one object reference separate items in each array. That's actually totally fine by OOP - all it says is that you want to be able to type obj->pos and obj->ammo, you want to be able to call obj1->~ShootAt(obj2) and so on - it doesn't say anything at all about how the data structures should be laid out. So in theory DOD and OOP should be completely compatible - OOP is a design and syntax philosophy, while DOD cares about data layout and traversal order. An example of a language where these two concepts are kept separate are databases - how you access the data (i.e. views/queries) is decoupled from how the data is stored. Unfortunately, C++ makes it really difficult to separate data layout from syntax. I suspect you can do it with gratuitous overloading and/or templating, or you can force everything to use accessor functions, but then the maintainability of the code plummets rapidly. So yes, //in C++// DOD and OOP tend to tug in different directions. But again, that's just K&R&S kicking you in the nuts. You really should be used to it by now.
[[HelloThere]] [[Sparse-world storage formats]] [[Matrix maths and names]] [[New Job]] [[A sprintf that isn't as ugly]] [[Saving, loading, replaying and debugging]] [[StarTopia love]] [[Logging, asserts and unit tests]] [[Data Oriented Luddites]] [[Moore's Law vs Duck Typing]] [[Platform-agnostic types]] [[Texture streaming systems with two levels of cache]] [[Visibility and priority tests for streaming assets]] [[Squares or hexes]] [[Dwarf Minecart Tycoon]] [[All blog entries]]
...in progress...\n\nFP precision problems between machines.\n\nCompiler optimizations.\n\nChanging "constants".\n\nRecording & holding RNGs\n\nPlayer input record (now need to append to a savegame?) and playback.\n\n
A way of compressing vertex data using the principles of displacement mapping, but without needing complex hardware support.\n\nThe idea is to encode the vertex as three indices to other vertices, plus two barycentric coordinates, plus a displacement along the interpolated normal. This gives you a vertex that is about 8 bytes in size, which is far smaller than most representations, reducing memory footprint and bandwidth requirements. This stream can be easily decoded by ~VS1.1 shader hardware, which is now ubiquitous.\n\nFor more details see:\n* Dean Calver's article in ~ShaderX2 "Using Vertex Shaders For Geometry Compression" [[(available for free now)|http://www.realtimerendering.com/blog/shaderx2-books-available-for-free-download/]]\n* [[My GDC2003 paper on Displacement Mapping|http://www.eelpi.gotdns.org/papers/papers.html]]
If you're going to release a demo, try and minimally playtest it first on some people who (a) have not already played your game and (b) have $50 they can spend on anything they like. If for some bizarre reason they fail to immediately hand over the $50 in exchange for a full copy of the game, you might want to think about tweaking your demo.\n\nHere's the checklist. It's not exactly rocket-science, but it's truly astounding how many manage to violate not one point on it, but every single one. It's even more astounding how random downloads from thid- and fourth-tier publishers on Fileplanet or whereever manage to consistently score several bazillion points above the first-tier publishers parading their stupidity on the heavily-controlled and scruitinised Xbox Live. Maybe that's because those lowlier publishers have to work for their money - who knows?\n\n(1) To paraphrase Miyamoto - a late demo can eventually be good, while a rushed demo is bad forever. To put it another way - the demo isn't for the fanbois who read the gaming press and would buy a turd-in-a-box from comapny X. The demo is for the non-hardcore looking for something new. If you release the demo early and it sucks, you will turn away everyone, including the fanbois. If you ship it after the game goes to master and make sure it's not horrible, it will have the most glaring bugs fixed, be more balanced, and you might pull in some longer-term interest and keep yourself from being hurled into the bargain-bin two weeks after release.\n\n(2) If at all possible, include the tutorial. I know you think your precious snowflake is so intuitive it's just pick-up-and-play. But again remember - your demo is NOT for the fanboy who bought the first eight episodes of your game series. It's to find the people who don't have preorders. They may not know how to play your game. They may have never played a game like yours. They may have never played any game. That is probably why they have a surplus of $50 bills. One of them could be yours.\n\n(3) If you really can't be bothered with a tutorial, at least have some help screens that tell people what the buttons on the controller do. Maybe even what the strange bars on the screen do, or what the icons on the mini-map do, and what to do about each of them. Especially when they flash - flashing says "I'm important". Not explaing what important stuff is shows disdain for the player. And don't make those screens be the loading screens because guess what - they vanish while the player is reading them. Nothing shows a new player the middle finger quite so directly as a screen full of important text that you then don't let them read properly. They haven't even pressed a button and already they hate you!\n\n(4) Make the demo nice and easy. Demos are not there to present a challenge to the player, they're there to demonstrate the full set of features to them. So for example in a certain fighting game renowed for its boob physics, you might want to slow everything down a bit and make the computer opponents not quite so ... well ... ninja. Then the player will actually be able to experiment with all the cool throws you spent so long doing the game code and animation for, instead of being mid-air juggled for 75% of the health bar the instant they do anything but madly bash the uppercut button.\n\n(5) What is your Unique Selling Point? Why is your game teh ace and all others teh suk? After all, that is why you spent 500 man-years on this product, and that is why your publisher is beating you over the head about deadlines. I may be talking crazy here, but I'm thinking you may wish to explain it to the player. If you don't, the player will be looking at your product and thinking deep philosophical questions such as "why would I spend $50 on this?" You do not want them thinking those sort of questions - you want them thinking "wow - I'm glad I spent $50 on this brilliant game." If you can't manage that, at least go for "shame I spent $50 on this game - the USP sounded cool but it wasn't that much fun after level 8". Your bank manger is equally happy with either.\n\nThis is your first contact with a large chunk of your potential audience. They have $50, and they are deciding whether to give it to you, so you can feed your wife/husband/kids/cat/llama/habit. Marketing has done its job, and the punter is finally devoting their full and undivided attention to your game and playing your demo. Now it's all up to you - this is your fifteen minutes of fame. If that person doesn't like your demo, it doesn't matter how hard you crunched to get the game out the door.
As a side hobby from the SuperSecretProject, I've been tinkering with writing a game. When I was writing games professionally I really didn't feel like doing more games at home, but now it's a refreshing change from designing instruction sets. However, the question was - what game to do? I didn't want to try to think up something new (dammit Jim, I'm a programmer not a game designer!), so I just blatantly stole concepts from two of my favourite games - [[Dwarf Fortress|http://www.bay12games.com/dwarves/]] and [[Transport Tycoon|http://en.wikipedia.org/wiki/Transport_Tycoon]]. I like them for very similar reasons - you set up a system, and little autonomous agents run around in that system. That sort of indirect influence is really appealing to me as a programmer as well as a game-player. So starting with the high-level concept of "Dwarf Fortress with trains", it could go pretty much any place really.\n\nThe point of this is to do things I've not really done before, such as writing map structures, managing inventory, path-finding, simple UI, load/save. Whereas I have already done tons of graphics and sound stuff, so for this project I literally don't care about graphics. I literally have a bunch of teapots running around in a gouraud-shaded fixed-function-vertex-shader environment. I'm just using the ~D3DXCreate* functions and abusing my poor graphics card with the number of draw calls, and I refuse to care! This is very much in the spirit of Dwarf Fortress, which uses a full ~OpenGL pipeline to emulate a 16-colour ASCII terminal - though I do at least have a proper 3D camera for the sake of playability. Much as I love DF, the 2D-only interface is a huge barrier to gameplay.\n\nI have no pretensions to ever ship the game - it may not even ever be fun to play (since my goal is far more that it should be fun to program!). It's certainly not playable now. But as a programmer there's lots of interesting little things that have kept me intrigued that have never really come up in my life as a graphics coder before. So I thought I'd write about some of them.
With the usual anti-bot obfuscations, my address is: tom ~~dot~~ forsyth ~~at~~ eelpi ~~dot~~ gotdns ~~dot~~ org.
Bonus karma points to Ben Garney of Garage Games who spotted a silly math error of mine in [[Knowing which mipmap levels are needed]]. Now corrected.
Lots of fun. Slightly quieter than last year, but according to many it was because companies are only sending the key people, rather than the massed hordes. Also seemed like a lot fewer of the random mousemat manufacturers, DVD duplicators and the like.\n\nMichael Abrash and I gave our talks. He almost filled his gigantic 1000-person room, and my little 300-person room was totally full - not even standing room available. I made sure I left 10 minutes at the end of my talk for questions, but we ended up with half an hour of questions and they had to kick us out of the room for the next session. So I call that a success.\n\nA general site for all your Larrabee needs is: http://www.intel.com/software/larrabee/\n\nOur slides are available here (scroll down near the bottom): http://intel.com/software/gdc\n\nMichael Abrash's article in Dr. Dobbs Journal is now live: http://www.ddj.com/architect/216402188\n\nThe "C++ Larrabee Prototype Primitives" available from here: http://software.intel.com/en-us/articles/prototype-primitives-guide/ These allow you to write Larrabee intrinsics code, but then compile and run it on pretty much anything just by adding a #include into the file. The intent was to make it as easy as possible to prototype some code in any existing project without changing the build.\n\nI went to see [[Rune|http://runevision.com/blog/]]'s talk on locomotion (see [[How to walk better]]), and it was really superb. Great tech, great (interactive!) slides, and he's a really good presenter. If you've read his stuff before, go see it again as there's a bunch of stuff I hadn't seen before - tricks like how to stop the ankle flexing bizarrely as people step down from high places.
I was asked recently what the best way to approach [[http://www.gdconf.com/|GDC]] was for the newbie. GDC is a big busy place, and it's easy for people to get a bit lost. Here's a few handy hints:\n\n* Take an hour beforehand to sit down and go through the entire timetable. In each timeslot, mark the top three things you'd like to see, and give them a score from "might be interesting" to "must see". Remember to at least look at the tracks that aren't your main discipline - they sometimes have interesting or crossover stuff. Write these scores down, and when you actually get to GDC, transfer them onto the at-a-glance schedule thingie you get in the welcome pack, then you can refer to it quickly during the day.\n\n* Prioritise sessions you don't know much about but sound interesting over things that are right in the middle of your primary field. For example, I'm a graphics programmer who's already worked a lot with DX10, so there's probably not a great deal of point going to a talk called "D3D10 Unleashed: New Features and Effects" - it's not going to teach me much new stuff. But I've not all that done much general-purpose multi-processor work, so the talk called "Dragged Kicking and Screaming: Source Multicore" sounds fascinating and will probably give me some new insight when I do start doing multi-processor work.\n\n* Be prepared to miss a lot of the lectures - GDC is pretty chaotic. Allow it to be chaotic - don't try to control it too much. If you're talking to someone interesting, and they're not rushing off to something, keep talking to them. You can always go into the lecture for the second half, or at worst you can pick up the lecture notes later. But you might never meet that interesting person again.\n\n* Don't spend too long on familiar stuff. GDC's meant to be about new stuff, so don't just hang out at a bar and gossip with your friends all day. Hang out at a bar and gossip with //new// people instead.\n\n* Don't spend too long in the expo. Have a quick walk around fairly early on and check the place out, see what might be interesting, but don't hang around or play with stuff. There's two halls this year, and with E3 gone it's even bigger and probably noisier than previous GDCs. If you do find something interesting, note it down for later so when you do find yourself with a spare half hour, or during lunch, you can go back and have a better look. Similarly, if the expo is crowded, go do something else. It's probably lunch-time or something. Come back in an hour and it will be quieter. The booths and the people on them will still be there, and you'll actually be able to see something and talk without shouting. Also don't do the expo on the first day or the last day - that's what everyone else does.\n\n* Remember that there's two expo halls this year.\n\n* If after five minutes of a lecture you find the lecturer is dull, or presents badly, or is just reading off the slides, leave. It's obvious you're going to get just as much from reading the slides later as you will from seeing the talk in person, and it'll be quicker. Your time is valuable - go to the next-best-rated lecture on your list. Some lectures are 10x better in person - those are the ones you should be watching. If you feel too embarrassed to just walk out, take your phone out as if you had it on vibrate and just got a call or a text message, and walk out glaring at it.\n\n* During the lectures, have a notepad and pen out. When you have a question, scribble it down. At the end of the lecture, the presenter will ask for questions, and there's always a deafening silence for a few minutes while everyone remembers what their questions were. That's when you can stick your hand up or go to the microphone and ask your question.\n\n* Fill in those feedback forms. The GDC committee do collate them and do take them pretty seriously when considering who to invite back to speak next year. The authors do also get the feedback and some of the comments, so if they have good info but are a terrible speaker, say so. And use the whole scale. I forget what it's out of, but if it's 1-10, 1 = bad, 5 = good, 10 = excellent. No point in messing around with 6, 7, 8 as scores - your voice will be lost in the statistical noise.\n\n* Everyone always asks me about the parties and which ones to go to. I'm not a party animal, so I don't enjoy them for what they are. And as far as socialising, they're terrible. Talking is almost impossible and unless you're very wealthy or persistent, you'll be hungry and thirsty. Much better to hang out in a hotel bar or lobby with some friends. You'll still see plenty of game geeks doing the same, but you'll be able to talk and eat. And it's far more relaxing, which means you can gossip til the early hours and still be reasonably awake for the first lecture next day.\n\n* Don't worry if you think you're missing all the cool stuff. Everybody always misses all the cool stuff. This year they're in SF rather than San Jose, so nobody knows where anything is, so even old pros will be confused.\n\n* Go to the Indie Game Festival if you can. Definitely visit the IGF pavillion to check out the games and chat to the developers. The standard is awesomely high every year.\n\n* Play the meatspace MMO game - this year it's "Gangs of GDC". Doesn't usually take up a significant amount of time. It's a good excuse to be silly and talk to people.\n\n* The Programmers Challenge is back! Always amusing for the uber-geek.\n\n* The Developer's Choice Awards are worthwhile and not as bogus as most of the other awards. Go to them and cheer for people that don't suck.
''WORK IN PROGRESS'' - which is why it's not on the main page yet.\n\nA topic guaranteed to cause an argument amongst the shipping veterans is where you should do editing of levels, scripts, shaders, etc. There's three obvious options, although different aspects may use different ones, e.g. shader editing may be done in one place while level editing done in another:\n\n-A plugin for your DCC tool of choice (Max, Maya, XSI, etc)\n-A standalone editor.\n-A part of your game.\n\nThe second two are obviously close - the difference being whether you have to do a save/convert/reload cycle every time you want to test something in the actual game. For some console developers, this is a more obvious distinction as it means they don't have to make a PC version of their game simply to test & edit it. To simplify the argument, I'm going to lump the second two together and say there's two choices - embedded in your DCC tool, or as a fully-custom editor that shares as much as possible with your game engine.\n\nThe advantages of making your game be the editor are obvious. You get to see things as they really are, almost instantly. You're using the real rendering engine, you're using the real asset-management system, and you're using the real physics engine. If some cunning design or model won't work inside your engine, you'll know about it immediately, not three months later when you've already authored a bunch of content. Your artists and designers can experiment however they like, because they see the results immediately and know if they look good or play well. The biggest advantage, if you make it right, is that level designers can set up objects and scripts, hit a button, and they're instantly playing the game. Keeping that design/test/tweak iteration cycle as fast and tight as possible is the key to excellent gameplay. The same is true of lighting (whether realtime or lightmap-driven) and sound environments - both of which rely on having large chunks of the environment around to see whether they work or not, and are almost impossible to do in small unconnected pieces, or without very fast previews.\n\nThe disadvantages are that it's more complex to make an engine that can run in two different modes - editing, and playing. The editing mode will be running on a powerful PC, often with unoptimised assets (optimising takes time and leads to longer turnaround cycles - you don't want that).\n\nThe argument for putting the editing tools inside the DCC tool are obvious - your artists already know how to use the tool, and it's a bunch of editing & rendering code that's already been written. Why reinvent the wheel? Ordinarily I'd agree - wheel reinvention is a plague, and burns far too much programmer time in our industry. But hang on a bit - is that really true?\n\n\n\n\n\nFundamentally, there's a few big problems we have to deal with:\n*Previewing assets in the actual game environment. That means shaders, textures, lighting models, shadows, lightmaps. If the artists can't see all that as a fast preview while they're building the model, they're going to be doing a lot of guesswork.\n*Level design. Placeholder artwork, hacked-in control schemes, pre-scripted actor movement - doesn't matter - you need to give the designers tools to work with so they can start getting the broad-strokes design stuff up and running so that everyone can get a feel for what the game is going to be about.\n*Sound & animation design. Seems odd to put these two together? Not really. Both have a time component to them, and they need smooth blending. This makes them fundamentally different things to rendering and lighting. But you still need to be able to easily preview them in the real game. Also, animations frequently have sound triggers on them - footsteps, equipment sounds, etc. For bonus annoyance, typically the animator and the sound guy aren't the same person, don't know how to use each other's tools, and will want to edit the files independently.\n*Keep iteration times down. And by "down" I mean less than 15 seconds from tweaking a model/texture/light/object placement/trigger/script to trying it out in the game. We all know that iteration and evolution is at the heart of a good game, so we need to keep the dead time in that iteration loop to the lowest possible.\n\nMy preferences:\n\nMake your engine able to reload assets at the press of a button. Having to shut it down and start it up wastes time, especially as those are the areas you tend to optimise last in a game. You need your artists to be able to edit something in the DCC tool, hit a button, and have the results visible right then. This requires a decent amount of infrastructure, though a lot of that is shared with things like streaming systems (which are really good things to have), and stuff like dealing with Alt+Tab on ~PCs. Even if all you can do is reload existing assets, that's a big time-saver. If you can further and import new models with new texture names, etc. that's even better.\n\nMake sure your engine can read assets directly from the exported data, or do so with minimal and fully-automated processing. It doesn't have to render them quickly, and you obviously need another optimised format that you will actually ship data in, but you also need something with a very quick turnaround time. It doesn't matter if the model the artists is currently editing is loading all its textures from raw ~TGAs - we all have big video cards these days. Once they're happy with the textures, a batch script or tool can compress them down to your preferred format. This can either be done overnight, or on another preview button so the artist can see what the compressed version actually looks like.\n\nDitto with things like vertex cache reordering and optimal animation-bone partitioning - none of that stuff matters while the artist is actually editing the model - do it in a batch process overnight. One of the really nice things about Granny3D that we've focussed on a fair bit over the years is the forward and backward compatibility of the file format - you can store all sorts of data in a .~GR2 file, both raw and compressed, and they should all be loadable by anything at any time.\n\nOne subtlety here is that a processor should never modify original data. The stuff that comes of out the DCC tool should be minimally processed during the export process - convert it to a standard format, but leave as much data in there as you can. Later processing steps can strip that data out, but they will do so with a different file as a target, not an in-place modification. That way when the optimised format changes, you don't need to fire up the DCC tool and do a while bunch of re-exporting - you can just run a batch file on all the raw exported data and have it in the new and improved format.\n\nThe level & lighting editor should be the game. Using Max/Maya/XSI/Lightwave as a level editor never works well. I've tried it both ways, and I've discussed this with others. Think about it - all you're getting out of your DCC tool is a camera movement model (easily replicated), a bad renderer (you already have a good renderer), some raycast mouse-hit checking (every game has these already) and a bad button UI that I guarantee you will annoy the snot out of you when you actually go to use it. Trying to use the DCC tool's interface to do any serious previewing is horrible - just think about trying to implement some sort of shadowbuffer rendering, or implementing the movement logic of the character inside inside a DCC tool - it's pretty absurd. This is not reinventing the wheel - there's just so little of a DCC tool you actually need or want in something of the scope of a level editor.\n\nThe animation & sound editor should absolutely be the game. There's just no doubt here - DCC tools don't help at all. Because you need to be able to see the blending and transitions between different animations, and the 3D sound environment as it really will be, a DCC tool doesn't bring anything to the table. The way I've seen animation editors done really well in the past is that the game is playable (ideally with control over any character) and the last thirty seconds are continuously recorded. At any time the animator/sound guy can pause and scrub through that time window, tweaking blends and when sound events happen.\n\nSo that's my rant - the smartest coders should be tools coders, and they need to write level editors and makefile-driven processing chains. DCC tools are for Digital Content Creation on a micro scale - individual characters, individual animation sets. They're not for entire levels, and they're really bad at previewing.
This is pretty handy - [[GameMiddleware|http://www.gamemiddleware.org/middleware/]] - a website that is just a big list of games middleware. I'm so tired of people reinventing perfectly good wheels. Unless your game is going to focus on having the best system X in the world, why not go and look if someone's already got a perfectly good X you can just buy? Then you can spend your time solving some more interesting problems.\n\nAlso tinkered with the titles. Not much point having the titles just be the date when the date's written just below them. And the RSS looks more sensible that way.
Granny3D is a big bag of goodness from [[RadGameTools]]. It's an exporter of all sorts of data from Max, Maya and XSI, it's an asset pipeline manipulation and conversion tool, it's a normal-map generator, it's a really cool file system that copes with endianness, 32 or 64 bit pointers, and versioning, it ships on 11 different platforms, and above all it's the world's most superlative runtime animation system.\n\nGranny was written by [[Casey Muratori|http://www.mollyrocket.com/]] originally, handed over to me for two exciting years, and then taken over by [[Dave Moore|http://onepartcode.com/]]. Even though I haven't worked on Granny in a while, I still pimp it like crazy to anyone within earshot. Seriously - it's a bunch of really cool tech, a bunch of rather tedious tech, a bunch of gnarly problems solved by the brightest minds in the business (and me), and it's all wrapped up into a lovely little bundle that if you were to replicate yourself would make you officially insane. Pay the money and get on with something more interesting! When I think of all the man-years it would have saved me when working at MuckyFoot, ooh it makes me mad. As Dave said once "we can't replace your existing animation coder, but we can paint a [[big red S|http://en.wikipedia.org/wiki/Superman]] on his chest".\n\nAs time goes on, Granny becomes more and more about the entire toolchain - taking your mesh, texture, tangent space, animation and annotation data from the various DCC tools, exporting it in a reliable robust cross-platform format, processing that data in a large variety of ways, and then finally delivering the data in optimised form to your game engine. This is an under-appreciated area of games programming. It's not viewed as a sexy job for coders, and it tends to be done hastily, often by junior programmers. As a result, it is usually a source of much pain, not least by the artists and designers who have to use it. Don't let this happen to you - start your toolchain on a solid foundation.
Hello and welcome to my blog. Always fresh, always technobabble. Read the chronological blog that is usually listed below, or use some of the links to the left. Or EmailMe. Although this is technically a wiki, you'll just be editing your own personal copy, not the public one.
A while back I made a "How to Walk" demo for Granny3D (slides available here: [[http://www.eelpi.gotdns.org/papers/papers.html|http://www.eelpi.gotdns.org/papers/papers.html]]) that showed how to do interpolation of walking stances, how to cope with uneven terrain and do foot IK. It's trickier than it looks. I always meant to keep updating that demo with more features - most notably missing was how to stop! I'm sure many of you remember it from GDC as "the one with Granny in a Xena: Warrior Princess outfit". It's certainly... eye catching (thanks to [[Steve Theodore|http://www.bungie.net/Inside/MeetTheTeam.aspx?person=stevet]] for the memorable artwork). Anyway, I got distracted by Larrabee, and never did add those features, which I always felt bad about.\n\n[[Rune Skovbo Johansen|http://runevision.com/blog/]] has done a similar system, but taken it a lot further. He uses fewer animation cycles and copes with more terrain. Cool features include 8-way motion, ledge avoidance, stop/start, leaning into slopes, four-legged animals, and foot placement on stationary rotation. It's a very comprehensive locomotion system, and I'm going to see if I can get to [[his GDC talk|https://www.cmpevents.com/GD09/a.asp?option=C&V=11&SessID=9004]] on it.
I've converted my old article on impostors to HTML so it's "live" and searchable and so on. Also added a few hindsights. The article was written before [[StarTopia]] was published, so there's a few implementation notes added about what did and didn't work in practice. It's in the [[papers and articles section|http://www.eelpi.gotdns.org/papers/papers.html]]
People have been bugging me for ages, so I guess it's time to start a blog. I'll try not to ramble too much.\n\nFirst up - [[The Elder Scrolls IV: Oblivion|http://www.elderscrolls.com/]] - what a totally fantastic game. I'm playing it on the Xbox 360 and it's gorgeous. Yes, yes, it's got gameplay up the wazoo as well (that's why I'm 30 hours in and barely feel like I've started), but I'm a graphics tech whore, so I reserve the right to spooge over the pixels. It's amazingly immersive - they've worked out that immersion is defined not by your best bit, but by your worst, and they've spent a lot of time filing off all the rough edges. It's one of the few games where the screenshots on the website are what it looks like //all the time//. Other games, you walk around and there's texture seams all over the place, and when you get close to some things it's really obvious they're just a big flat billboard, or whatever. Not in this. I spent quarter of an hour yesterday just walking around looking at trees and plants and fallen logs and lakes and little stone bridges and the way the sunlight went through the leaves - totally forgot about playing the game until a mud crab almost tore off my leg.\n\nI probably need to get out more into the real world - I hear they have some pretty neato trees and stuff as well, not to mention the world's most powerful photon-mapper. And fewer mud crabs.
There's a fundamental signal-processing concept called various different names, mostly with the word [[Nyquist]] in them, which (if you wave your hands a lot and ignore the screams of people who actually know what they're talking about) says that an object that is 32 pixels high on screen only needs to use a 32x32 texture when you're rendering it. If you give it a 64x64 texture, not only will the mipmapping hardware ignore that extra data and use the 32x32 anyway, but //it's doing the right thing// - if you disable the mipmapping and force it to use the larger version, you will get sparkling and ugliness.\n\nObviously it's more complex than that, and I'll get into the details, but the important point is - the size of texture you need for an object is directly proportional to the object's size on-screen. And that's what texturing hardware does - it picks appropriate mipmap levels to ensure that this is the size its actually using. Every graphics programmer should be nodding their head and saying "well of course" at this point. But there's another thing that is subtly different, but important to realise. And that is that so far I haven't said how big my texture is, i.e. what size the top mipmap level is - because it doesn't matter. Whether you give the graphics card a 2048x2048 or a 64x64, it's always just going to use the 32x32 version - the only difference is how much memory you're wasting on data that never gets used.\n\nIf you are streaming your textures off disk, or creating them on-demand (e.g. procedural terrain systems), this can be incredibly useful to know. If you don't need to spend cycles and memory loading big mipmap levels for objects in the distance, everything is going to load much quicker and take less memory, which means you can use the extra space and time to increase your scene complexity. Obviously my approximation above is very coarse - it's not just size in pixels that matter, because we use texture coordinates to map textures onto objects. So how do you actually find the largest mipmap you need?\n\nAs a thought experiment (this is going to start out absurd - but bear with me), take each triangle, and calculate how many texels the artist mapped to that triangle for the top mipamp level. Then calculate how many pixels that triangle occupies on the screen. If the number of texels is more than the number of pixels, then by the above rule you don't need the top mipmap level - throw it away, and divide the number of texels by 4. Keep doing that until the number of texels in the new top mipmap level is less than the number of screen pixels. (keen readers will spot that trilinear filtering means you actually need one more mipmap level than this - to keep things simple, I'm going to ignore that for now, and we can add one at the end). Do this for all the triangles that use a certain texture, and throw away all the mipmap levels that none of the triangles need.\n\nThat's the concept. Obviously far too expensive to do in practice. The first step is to get rid of the loop and say that the number of mipmap levels you can throw away = 0.5*log2 (top_mipmap_texel_area / screen_area). Quick sanity check - 128x128 texture has area 16384 texels. If you draw it to a 32x32 pixel square on screen, that's 1024 pixels, so you need to throw away 0.5*log2(16384/1024) = 0.5*log2(16) = 2 mipmap levels, which is correct. Why multiplying by half? Because we actually don't want log2(x), we want log4(x), because the number of texels drops by 4x each mipmap level, not 2x. But log4(x) = 0.5*log2(x).\n\nObviously, you can precalculate the value "top_mipmap_texel_area" - once a mesh is texture-mapped, it's a constant value for each triangle. But calculating the screen-area of each triangle each frame is expensive, so we want to approximate. If we pick an area approximation that is too high, then the actual value of screen_area will be higher than the one for the current frame, and we'll throw away fewer mipmap levels than we could have done. This isn't actually that bad - yes, we waste a bit of memory, but the texturing hardware still does the right thing, and will produce the right image. So approximating too high doesn't change image quality at all. Given that, the approximation we make is to assume that every triangle is always parallel to the screen plane - that its normal is pointing directly at the viewer. This is the largest the triangle can possibly be in screen pixels. In practice it will be at an angle and take fewer screen pixels.\n\nSo how large is this maximum size? Well, if we assume mesh deformation and skinning don't do crazy things and ignore them, a triangle always stays the same size in world units (e.g. meters). At distance D from the viewer, with a horizontal screen field of view of F radians, and a horizontal pixel size of P, the screen length of m world meters is (m/D)*(P/tan(F)) pixels in length. That's world length to screen length, but we want world area to screen area, so we square the result. But if we feed in world area of the triangle, M=m^2, then we get the screen area is (M/(D^2))*((P/tan(F))^2) = M*((P/(D*tan(F)))^2). As a quick sanity-check: double the distance away = quarter the screen pixels. Double the screen resolution = four times the pixels. Widen the FOV = smaller screen area.\n\n(P/tan(F))^2 gets calculated once a frame, so that's a trivial cost. And we'll make another approximation that the distance D is not done at every triangle, it's measured from the closest point of the bounding volume of the object. Again, we're being conservative - we're assuming the triangles are closer (i.e. larger) than they really are. So for a single mesh, we can calculate ((P/(D*tan(F)))^2) just once, and then use it for all the triangles. For brevity, I'll call this factor Q, so for each triangle, screen_area <= (world_area * Q).\n\nSo looking back at the mipmap calculation, we calculate 0.5*log2 (top_mipmap_texel_area / screen_area ) for each triangle, take the minimum of all those, and that's how many mipmap levels we can actually throw away.\n\n.... min_over_mesh (0.5*log2 (top_mipmap_texel_area / screen_area))\n >= min_over_mesh (0.5*log2 (top_mipmap_texel_area / (world_area * Q))\n..= min_over_mesh (0.5*log2 ((top_mipmap_texel_area / world_area) / Q)\n..= 0.5*log2 (min_over_mesh(top_mipmap_texel_area / world_area) / Q)\n\nAnd of course the value (top_mipmap_texel_area/world_area) is a constant for each triangle the mesh (again, assuming skinning and deformation don't do extreme things), so the minimum value of that is a constant for the entire mesh that you can precalculate. The result is that we haven't done any per-triangle calculations at all at runtime. If you grind through the maths a bit more, you find that it all boils down to A+B+log2(D), where A is a per-frame constant, B is a per-mesh constant, and D is the distance of the mesh from the camera (remember that log2(x/y) = log2(x)-log2(y)). Again, quick sanity check - if you double the distance of a mesh from the viewer, log2(D) increases by 1, so you drop one mipmap level. Which is correct.\n\nThis is pretty nifty. If you're doing a streaming texture system, it means that for each mesh that uses a texture, at runtime you can do a log2 and two adds each frame and you get a result saying how many mipmap levels you didn't need. If you remember that throwing away just one mipmap level saves 75% of the texture's memory, this can be a huge benefit - who doesn't want 75% more memory!\n\nThis isn't just theory - I implemented it in the streaming system used for [[Blade II]] on the Xbox and ~PS2 engines at [[MuckyFoot]], and it saved a lot of memory per texture. This meant we could keep a lot more textures in memory. As well as allowing more textures per frame, it also meant that we could keep a lot more textures cached in memory than we actually needed for that frame. This allowed us to stream far more aggressively than we originally intended. The initial design for the ~PS2 engine was to not stream - we assumed that DVD seek times would cripple a streaming system. But because there was a lot more memory available for textures, we could prefetch further ahead and reorder the streaming requests to reduce seek times. And the system actually worked pretty well.\n\n''Flip the Question''\n\nThe next neat trick is instead of asking "how many mipmap levels do I throw away", you ask "how many do I need". This is obviously just the same number, subtracted from log2(texture_size). If we go back to the original equation of 0.5*log2 (top_mipmap_texel_area / screen_area) - look at how you'd actually calculate "top_mipmap_texel_area" for a triangle. You find the area in the idealised [0-1] UV texture coordinate space, and then you multiply by the number of texels in the texture. So the answer to the new question "how many mipmaps do I need" is:\n\n...log2(texture_size) - 0.5*log2 ((texture_size^2) * area_in_uv_space / screen_area)\n= log2(texture_size) - 0.5*log2 (area_in_uv_space / screen_area) - 0.5*log2(texture_size^2)\n= log2(texture_size) - 0.5*log2 (area_in_uv_space / screen_area) - log2(texture_size)\n= - 0.5*log2 (area_in_uv_space / screen_area)\n= 0.5*log2 (screen_area / area_in_uv_space)\n\nHey! What happened to the texture size - it all canceled out! Yep - that's right. It goes back to something I mentioned right at the top. The texture selection hardware does not care what size the top mipmap level is - unless it wants one that doesn't exist, obviously. But if you give it a large enough texture, it doesn't matter how big that texture is.\n\nAgain, every graphics coder is saying "right, I knew that, so?" But this is actually quite a non-obvious thing. It means that if you have a streaming texture system, the only thing that matters is "how many mipmaps do I need to load into memory" - the rest are left on the DVD and not loaded into memory. And the answer to that question is independent of the size of the original texture - it only depends on the mesh and the UV mapping and the distance from the camera. So the actual graphics engine - the streaming and rendering - //doesn't care how large the textures are//. And that means that the rendering speed, the space taken in memory, and the time needed to read it off disk are all identical even if the artists make every texture in the entire game 16kx16k.\n\nTo say it again: the only limit to the size of textures the artists can make is the total space on the DVD, and the time taken to make those textures.\n\nI finally grokked this when I was making the [[MuckyFoot]] engine to be used for the games after [[Blade II]] and I kicked myself. It had actually been true for the [[Blade II]] engine - I just hadn't realised it at the time. But it was a pretty cool thing to go and tell the artists that there were no practical limits to texture sizes. They didn't really believe me at first - it's an almost heretical thing to tell 15 artists who have spent their professional lives with coders yelling at them for making a texture 64x64 instead of 32x32. Nevertheless, if you have a streaming system that only loads the mipmap levels you need, it is absolutely true.\n\n''Practical Problems''\n\nOf course there's still some practical limits. One thing that breaks this is the standard artist trick of using fewer texels for less important things, such as the soles of shoes or the undersides of cars. What happened was that they would use all this new texture space for interesting things like faces and hands, so the texture size would grow. But the soles of the feet would stay the same size in texels - because who cares about them? Then the preprocessor would calculate the mipmap level it would need for the texture, and the soles of the feet would dominate the result, because it would try to still map them correctly - it doesn't know that they're less important than everything else. There's a few ways to solve this - ideally the artists would be able to tag those faces so that they are ignored in the calculation, but this can be tricky to retrofit to some asset pipelines. A way that I found worked pretty well was to take the minimum linear texel density for each triangle (this copes with stretches and skewed texture mappings), and then for the mesh take the maximum of those minimums. This will find triangles on things like the face and hands, and that will be the density you assume the artist intended. If they didn't give as many texels per meter to other triangles, you assume that was intentional - that they deliberately wanted lower resolution on those areas.\n\nOne counter-intuitive thing is that the dead-space borders between areas in a texture atlas need to grow as well. If your artists were making 256x256 textures with 4 texels of space between parts, when they up-rez to 1024x1024, the space needs to grow to 16 texels as well. The border space is there to deal with bleeding when you create mipmaps, so you need to make sure that e.g. the 64x64 mipmap is the same in both cases.
''WORK IN PROGRESS'' - which is why it's not on the main page yet.\n\nPaint mipmap levels with alpha=1 for "valid", alpha=0 for "invalid". Sample twice, once with LOD clamp, and blend based on alpha.
Codename for an Intel processor architecture that I work on. There's been some disclosure, but not everything yet. Coming soon! Top links at the moment:\n\n[["Larrabee: A Many-Core x86 Architecture for Visual Computing" - presented at Siggraph 08|http://www.siggraph.org/s2008/attendees/program/item/?type=papers&id=34]]\n[[Three short Larrabee presentations at Siggraph 08 as part of the "Beyond Programmable Shading" course|http://s08.idav.ucdavis.edu/]]\n[[A pretty comprehensive AnandTech article|http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367&p=1]]\n[[The Wikipedia page|http://en.wikipedia.org/wiki/Larrabee_(GPU)]]
Yay! Michael Abrash and I are finally going to do a talk each about the instruction set we helped develop: [[Rasterization on Larrabee: A First Look at the Larrabee New Instructions (LRBni) in Action|https://www.cmpevents.com/GD09/a.asp?option=C&V=11&SessID=9138]] and [[SIMD Programming with Larrabee: A Second Look at the Larrabee New Instructions (LRBni) in Action|https://www.cmpevents.com/GD09/a.asp?option=C&V=11&SessID=9139]]. They're pretty much a pair - going to mine without seeing Mike's first won't mean very much for example. Note that these aren't graphics talks, they're for anybody who wants to program these cores in assembly or C. We will be using parts of the graphics pipeline as examples, because that's what we've spent most of our time doing, but there's no graphics knowledge required at all. Just bring your assembly head - we're going all the way to the metal.\n\nThere's very few programmers that can say they got to invent their own [[ISA|http://en.wikipedia.org/wiki/Instruction_set]], and I'm really looking forward to finally being able to talk about it in public after about three years of secrecy. Creating an ISA is all about compromises between programmer flexibility and how difficult the hardware is to build, but I am always astonished how much we managed to pack in without too much screaming from the hardware folks. It will be interesting seeing how people react to it - it's got a lot of funky features not found in other ISA that I know of.
I've been trying to keep quiet, but I need to get one thing very clear. Larrabee is going to render ~DirectX and ~OpenGL games through rasterisation, not through raytracing.\n\nI'm not sure how the message got so muddled. I think in our quest to just keep our heads down and get on with it, we've possibly been a bit too quiet. So some comments about exciting new rendering tech got misinterpreted as our one and only plan. Larrabee's tech enables many fascinating possibilities, and we're excited by all of them. But this current confusion has got a lot of developers worried about us breaking their games and forcing them to change the way they do things. That's not the case, and I apologise for any panic.\n\nThere's only one way to render the huge range of ~DirectX and ~OpenGL games out there, and that's the way they were designed to run - the conventional rasterisation pipeline. That has been the goal for the Larrabee team from day one, and it continues to be the primary focus of the hardware and software teams. We take triangles, we rasterise them, we do Z tests, we do pixel shading, we write to a framebuffer. There's plenty of room within that pipeline for innovation to last us for many years to come. It's done very nicely for over a quarter of a century, and there's plenty of life in the old thing yet.\n\nThere's no doubt Larrabee is going to be the world's most awesome raytracer. It's going to be the world's most awesome chip at a lot of heavy computing tasks - that's the joy of total programmability combined with serious number-crunching power. But that is cool stuff for those that want to play with wacky tech. We're not assuming everybody in the world will do this, we're not forcing anyone to do so, and we certainly can't just do it behind their backs and expect things to work - that would be absurd. Raytracing on Larrabee is a fascinating research project, it's an exciting new way of thinking about rendering scenes, just like splatting or voxels or any number of neat ideas, but it is absolutely not the focus of Larrabee's primary rendering capabilities, and never has been - not even for a moment.\n\nWe are totally focussed on making the existing (and future) DX and OGL pipelines go fast using far more conventional methods. When we talk about the rendering pipeline changing beyond what people currently know, we're talking about using something a lot less radical than raytracing. Still very exciting for game developers - stuff they've been asking for for ages that other ~IHVs have completely failed to deliver on. There's some very exciting changes to the existing graphics pipelines coming up if developers choose to enable the extra quality and consistency that Larrabee can offer. But these are incremental changes, and they will remain completely under game developers' control - if they don't want to use them, we will look like any other fast video card. We would not and could not change the rendering behaviour of the existing ~APIs.\n\nI'll probably get in trouble for this post, but it's driving me nuts seeing people spin their wheels on the results of misunderstandings. There's so much cool stuff on the way that we're excited about, and we think you're going to love them too.
Well, since [[Beyond3D|http://www.beyond3d.com/content/news/557]] picked up the story, I guess I should comment. I've not technically moved to Intel just yet, but the plans are in motion. Being one of those scary foreigners, there's inevitably some paperwork to sort out first. The wheels of bureaucracy grind slowly.\n\nThe SuperSecretProject is of course [[Larrabee|http://www.google.com/search?q=Larrabee+intel]], and while it's been amusing seeing people on the intertubes discuss how sucky we'll be at conventional rendering, I'm happy to report that this is not even remotely accurate. Also inaccurate is the perception that the "big boys are finally here" - they've been here all along, just keeping quiet and taking care of business.\n\nThat said, there's been some talented people joining the Larrabee crew - both as Intel employees and as external advisers. That includes "names" that people have heard of, but also many awesomely smart people who haven't chosen to be quite as visible. It's immensely exciting just bouncing ideas off each other. Frankly, we're all so used to working within the confines of the conventional GPU pipeline that we barely know where to start with the new flexibility. It's going to be a lot of fun figuring out which areas to explore first, which work within existing game frameworks, and which things require longer-term investments in tools and infrastructure - new rendering techniques aren't that much use if artists can't make content for them.\n\nExpect to hear much much more later (you just try and stop me!), but for now we must focus on getting things done. There's plenty of time for public discussions of algorithms and suchlike in the future. Thanks for the attention, but I need to engage the Larrabee cloaking field once more. I now return you to your irregularly scheduled blog.
I gave a talk at Stanford University recently, and it reminded me that I should make a list of links to public Larrabee info, so [[here it is|http://www.eelpi.gotdns.org/larrabee/larrabee.html]].\n\nThe Stanford talk was fun, and there's video of it available (link on the page above). So now you can watch me say "um" and "er" a lot. I'm usually better about that, but this was a small auditorium, I knew a surprising number of the audience, and so I forgot to put my "public presenter" head on.\n\nOn the recent media attention: we're not cancelled - far from it. The first version still needs some work, and we have a lot of software to write to turn a bunch of x86 cores into something that "looks" like a GPU to Windows before we can ship it to the public. That all takes a lot of time, and we won't be ready for the hoped ship this year. But it's not nearly as dramatic as some of the stories I've seen out there - people do love a sensational headline. I've never been on the inside of media attention before, but it's taught me that you need to remember the old adage - never believe anything you read on the internet.
One of my papers turns out to be useful to someone. And they asked me what my license was - or rather, their legal department asked. This then forced me to find a suitable license, which is very annoying, because the real one is of course "don't be a dick". I don't think the lawyers liked that one. After much browsing at [[http://www.opensource.org/licenses/|http://www.opensource.org/licenses/]] and finding a lot of wordy annoying ones, I settled on the MIT license, without the middle paragraph that requires attribution, because that's just pushy. Short and sweet - and the only objectionable thing is that LAWYERS SEEM TO NEED TO YOU TALK IN CAPS A LOT. Why is that? Maybe it makes judges like you more or something? Whatever - just go out there and do good stuff with it, people.
This is a bit of a random list of stuff. I tried to organise it into a nice narrative and made a mess of it. So instead I'll just write a bunch of words.\n\n\n''What is ASSERT for?''\n\nA recent discussion on AltDevBlogADay revealed that the question "what is ASSERT for?" has many possible answers and lots of people have different views on this. Sometimes people get hung up on what the word "assert" means, so let's list a bunch of useful tools we'd like to have without using that word. Chris Hargrove wrote a great taxonomy, so I'm just going to steal it:\n\n''Premortem'': Immediately fatal, and not ignorable. Fundamental assumption by an engineer has been disproven and needs immediate handling. Requires discipline on the part of the engineer to not add them in situations that are actually non-fatal (rule of thumb being that if a crash would be almost certain to happen anyway due to the same condition, then you’re no worse off making an assert).\n\n''Errors'': Probably fatal soon, but not necessarily immediately. Basically a marker for “you are now in a f*cked state, you might limp along a bit, but assume nothing”. Game continues, but an ugly red number gets displayed onscreen for how many of these have been encountered (so when people send you screenshots of bugs you can then point to the red error count and blame accordingly). Savegames are disabled from this point so as not to make the error effectively permanent; you should also deliberately violate a few other ~TCRs as soon as an error is encountered in order to ensure that all parties up and down the publisher/developer chain are aware of how bad things are. Errors are technically “ignorable” but everyone knows that it might only buy you a little bit of borrowed time; these are only a small step away from the immediately-blocking nature of an assert, but sometimes that small step can have a big impact on productivity.\n\n''Warnings'': Used for “you did something bad, but we caught it so it’s fine (the game state is still okay), however it might not be fine in the future so if you want to save yourself some headache you should fix this sooner rather than later”. Great for content problems. Also displayed onscreen as a yellow number (near the red error number). You can keep these around for a while and triage them when their utility is called into question.\n\n''Crumbs'': The meta-category for a large number of “verbose” informational breadcrumb categories that must be explicitly enabled so you don’t clutter everything up and obscure stuff that matters. Note that the occurrance of certain Errors should automatically enable relevant categories of crumbs so that more detailed information about the aforementioned f*cked state will be provided during the limp-along timeframe.\n\n(Chris actually called the first one an "assert" but I want to use that as a more generic term).\n\nI tend to use the actual ASSERT macro for the first three, and they'll all also put into a log. I also don't differentiate between the first two - it's hard to predict what's fatal and what's not, so I just let the cards fall where they will. Sometimes I'll put in the text of an "Error" or "Warning" something like "probably not fatal, but will cause strange effects." - generally "Premortems" will be something like "pThingie != NULL". I like the idea of the big red/yellow numbers on the screen - never done it, but it's an idea well worth stealing.\n\n\n''Fixing ASSERT''\n\nThe basic ASSERT most runtimes give you is way too heavy-handed and noisy. It'll do if you have nothing else, but it's got a whole bunch of problems:\n* When you fail, you get a dialogue box. This drives me nuts - I just want to go straight into the debugger as if I had hit a breakpoint.\n* But if someone is running without a debugger (e.g. an artist) they do want a slightly helpful message before the machine detonates in a shower of sparks.\n* Some ~ASSERTs are really just warnings, and sometimes I only want to be warned once, not every time they're hit. Especially if someone else wrote it and I don't understand what it's warning me about. This also includes things like "object went faster than the speed of sound" or "player intersects ground" warnings that you want to know about once, but will probably persist for a few frames before (hopefully) fixing themselves.\n\nThe things I do to make ASSERT more usable are, in rough order of easy to hard:\n\n* In debug mode, an assert becomes the debugger breakpoint, e.g. int3 on x86, and it's inlined in the code, not in a subroutine you have to pop the stack on. In release builds it is a dialogue box.\n* Invent ASSERTONCE. Does exactly what it says on the tin.\n* Each assert should have a useful text message for release build. This should at the very minimum tell the user who to poke about it. If a designer or artist can't immediately find which coder to poke about an assert, it simply increases noise and is worse than useless.\n* If an assert is an advisory about an asset, e.g. "texture is not power-of-two" then it should always say which asset it refers to. Again, there's no point telling an artist that a texture is not square if you don't say which one.\n* Even better is to tag each asset with the creator's name. Then the assert should say not only what the asset is but who created and ideally when. So if it's an asset made within the last day or so, it's not a big deal. A month old - fix that shit! Also it helps if the build knows who is currently running it. We found it helpful if people could easily ignore their own warnings if they were just trying something out, but wanted to know about stuff they hadn't immediately changed. Conversely, if checking changes in, they want to know about their screwups, not other peoples'.\n* People who aren't coders tend to ignore warnings. Fact of life. I clearly remember listening to two artists running the daily build - one said "oh, what's that" to which the other replied "oh it's another ignoreall" <click> ... <crash> "what happened?" That evening I removed the "Ignore All" button from the assert dialogue and made it something slightly less trivial to brush aside. The eventual solution was to log every assert and warning to somewhere useful to be analysed later. The ghetto way was to log it to a consistently-named file on the HDD in a public directory. Then the person in charge of the build (hi there!) periodically scraped all the work machines for this file, threw it into a database and found the most common and/or oldest-occurring asserts and went and bitched at people until they were fixed. It's surprisingly effective. A more elegant way would be to connect to a bug tracking system and auto-file bugs. Do the simple thing first - you'll be glad you did - if only because the hassle of dealing with it will prompt you to do the right thing eventually.\n\nTo give you an idea of how aggressively I use asserts, I picked a 2056-line C file of a private project that's mostly to do with gameplay. 235 are whitespace and 482 are lines with just a curly brace on them. 189 lines are comments - I tend to write more comments than most people, especially in complex code. 115 of those lines are asserts. In this case since the file doesn't deal with external assets, almost all of them are "premortem" types like pointer-checking, or sanity checks like sensible ranges for values. Leaving 1035 of actual code. So it's roughly a 10:2:1 ratio of code:comments:asserts, which feels about right. Another file has 2182:444:266 which is very similar.\n\n\n''Degrees of paranoia''\n\nI have a #define PARANOIA which can be set to 0 (shipping builds), 1 (normal development/debug), 2 (debugging something tricky) and 3 (evil things like memory scribblers). This is like the various DEBUG defines, except the value is important. Some of the more expensive debug checks only happen at the higher paranoia levels, and it's also somewhat orthogonal - you can enable paranoia without using debug, e.g. for trying to find bugs that only show up in release.\n\nAll standard ~ASSERTs are on PARANOIA>=1. Malloc/free/new/delete all do things like zapping data to 0xdeadbeef when PARANOIA>=1. More expensive checks go on PARANOIA>=2, and really expensive ones on PARANOIA>=3. The idea is that as you write code, you freely hurl ~ASSERTs and debug checks around like crazy. As the framerate starts to suffer, don't remove the checks, just wrap the older ones in higher and higher levels of PARANOIA. If you have a bug that the standard ~ASSERTs don't find, raise the paranoia levels. Every now and then (e.g. on checkin), do some runs on the higher paranoia levels as well.\n\nMost of my classes (or similar groups of functionality) have methods called Check() which does a general consistency check on the object - do its values lie in sensible ranges, if it has pointers to objects do they still exist, if there's meant to be a pointer back does it do so, etc. This function is nulled out in the header unless PARANOIA>=0. There is also a function ~ParanoidCheck() that calls Check() only if PARANOIA>=2. So in general I call Check() at creation, destruction and when changing pointer relationships e.g. when a character picks up a weapon, I'll call it on the weapon & the character to make sure nobody else has the weapon, the character doesn't already have a weapon in that hand, and that both actually exist. ~ParanoidCheck() can be sprinkled anywhere you have even the slightest suspicion there may be a problem, because unless PARANOIA>=2, it doesn't do anything. Also if PARANOIA>=2, once a game tick I will call Check() on every object in the entire world.\n\nIf there's any really expensive tests (e.g. checking every single pointer on every single object after every single object is processed every time), then I'll put them on PARANOIA>=3. That's basically the setting where it's such a nasty bug to find that I'm happy to leave it running for an hour a frame if in exchange an ASSERT fires to show me where it is. You can also put more expensive code inside Check() methods hidden behind PARANOIA>=3.\n\n\n''Deterministic playback''\n\nI have found this to be such a powerful debugging tool. Make sure your game is deterministically replayable. Note - here I only mean in single-player mode, only replayable on the same machine it was recorded on, and only for the same build. I'm just talking about debugging here, not as a gameplay feature. This can be tricky at times, but the basic steps are:\n\n* Decouple gameplay ticks & processing from machine-dependent things like rendering (this is good for all sorts of other reasons - I should write a post about it one day)\n* Record all player inputs to a log.\n* Make sure all random numbers are only pseudo-random, and store the seeds in the log.\n* Careful about using pointer addresses (which will change from run to run) as data. You can get subtle things e.g. using the pointer for a hash table, then traversing every object in the hash table - you'll get different traverse orders from run to run.\n* Record your debug levels in the log - different optimisation levels can produce different numerical results. This is another reason I keep PARANOIA distinct from DEBUG (though you can still get side-effects that screw things up).\n* Have a "no rendering" mode on a key that will just replay the game as fast as possible - that way you can fast-forward to the time you're investigating.\n\nEvery so often (e.g. 1000 gameplay ticks) autosave the current game state (use a rotating series of saves), append the log to the previous auto-save, and flush the log. This gives you a bunch of cool things:\n\n* Easy reproducibility of bugs, even ones that didn't get noticed for many frames such as AI bugs ("how did he get into that bit of the map?", "why is that henchmen trying to stab someone with a pineapple instead of a knife?", etc). If you get a bug, backtrack through your autosaves, watch the object/character in question and find where the bug originally manifested. This has saved me so much time in the past, it's worth it for just this one feature.\n* Makes sure your save/load works! It's amazing how easy it is to break save/load and not realize it for quite a while. This way, it's in constant use. It also encourages people to keep it fast!\n* Stress tests. You always have a big pool of saved games and player actions to replay whenever you feel nervous about the build integrity (of course they get obsolete as the gameplay changes, but it's still useful for folks not connected with gameplay e.g. graphics, sound, etc).\n* Mitigation for catastrophic bugs or machine instability (e.g. dodgy drivers) in the final released version. If people can always resume, having lost only a few minutes of play, they will be far less angry when asking for support.\n* Easy demo record/playback. Especially useful when you want to fly the camera to cinematic locations and engage the "high-quality rendering" mode with all the bells and whistles at print resolution that runs at 10 seconds per frame.\n\nAnother handy tip - if the game loads from an autosave, have it first copy it somewhere else with a unique name. Otherwise when you're debugging something from an autosave, it's a bit too easy to autosave over it while doing so, which is very frustrating. Also have a rotating sequence of autosaves covering at least half an hour of play - again for those places where the bug happened a long time before it was noticed.\n\n\n''Unit tests''\n\nYou can read tons of stuff about unit tests and how great they are, and I agree. In theory. In practice they are this big thing that says "write tons of code now, and maybe in a year you will fix an extra bug". It's a really hard pill to swallow, and as a result I've never managed to get a codebase of any significant size to use them. I can't even persuade myself to use them much. Result - nice idea, but it just doesn't seem to work in practice. You need something almost as effective without the big chunk of up-front work.\n\nThat's why I like doing things more like asserts. Everyone knows what they do, they're not scary, you write them right alongside the code, but used properly they can be a very powerful documentation and self-checking weapon, and with a few tricks you can gradually grow things into what amount to unit tests without actually sitting down and saying "right - today I'm going to spend four hours writing code that, on a good day, doesn't do anything."\n\nOnce you have good asserts, you've scattered your code with aggressive Check() and ~ParanoidCheck() calls, and you're 0xdeadbeefing your code, you've got 95% of the goodness. Now your unit tests don't really need to be that fancy because they don't actually do the checking, they just feed inputs into the game and the game self-checks. In many cases the unit test might just be loading and replaying a savegame and log. Most of time I find it much easier to "justify" writing a unit-test that would have caught a bug I actually found, whereas writing unit tests for bugs I might never create is mentally draining - I find it far easier to write tons of asserts and self-checkers instead as I go along.
I am TomF\nEmailMe\n\n[[Link to this blog|http://eelpi.gotdns.org/blog.wiki.html]]\n[[RSS feed|http://eelpi.gotdns.org/blog.wiki.xml]]\n[[Main website|http://eelpi.gotdns.org/]]\n\nThe Blog Roll:\n[[My wife|http://eelpi.livejournal.com/]]\n[[Canis|http://www.lycanthrope.org/~canis/]]\n[[Mark Baker|http://technobubble.info/]]\n[[Dave Moore|http://onepartcode.com/main]]\n[[Wolfgang Engel|http://diaryofagraphicsprogrammer.blogspot.com/]]\n[[Jake Simpson|http://blog.jakeworld.org/]]\n[[Rich Carlson|http://www.digital-eel.com/blog/]]\n[[Ignacio Castaño|http://castano.ludicon.com/blog/]]\n[[My DirectX FAQ|http://tomsdxfaq.blogspot.com/]]\n\nWhatIsThisThing\n[[TiddliWiki main|http://www.tiddlywiki.com/]]
I've been doing a lot of complex transformations between various coordinate systems recently, and it's crystallised a practice I've always only done half-heartedly - naming your transformations! If you don't name them right, you can get into some amazing bugs that can take days to figure out. If you do name them right, and do so consistently, it's actually quite difficult to make a mistake. I was reminded of this by [[an aside in a blog post that Fabian Giesen did|http://fgiesen.wordpress.com/2012/06/03/linear-algebra-toolbox-1/]], but I thought I'd expand on it a bit. By the way, you should absolutely read that series of three posts - it can clear up some misconceptions that even experienced coders have.\n\nSo - the naming convention. I'll use under_score_naming, but ~CamelCapsNaming works fine as well of course.\n\nA transformation is typically represented by a matrix, though it can be a variety of things, e.g. Euler angles and an offset, a quaternion and offset, or a dual quaternion. However you do it, you'll usually have an operation to apply a transformation to a position (a vector), which if you do right-to-left multiplication of these things (which seems to be the majority):\n\nVec new_vec = trans1 * old_vec;\n\n...and one to apply a transformation to another transformation:\n\nTrans new_trans = trans2 * trans1;\n\nAnd they associate freely, so that:\n\nVec new_vec = trans2 * ( trans1 * old_vec );\n........................= ( trans2 * trans1 ) * old_vec;\n\nHowever they do not commute, so ( trans2 * trans1 ) != ( trans1 * trans2 ), and not only does (trans * vec) != (vec * trans), the second term doesn't even mean anything - it won't compile.\n\nSo that's just a statement of conventions, and Fabian covered it in a lot more detail. Now what was that about naming?\n\nAll transformations convert a vector of values from one space to another. The typical transformation we're talking about in graphics is the one that describes the position and orientation of an object in the world. This transformation turns positions in model space (which the artist designed in the art package) into positions in world space, ready to be rendered. And for reasons that will become clear, we want to name things in the format "x_from_y". So this sort of transformation should be named "world_from_model". If you also name your vectors this way, you'll get expressions that look like this:\n\nVec vertex_world = world_from_model * vertex_model;\n\nNote the naming here - it's slightly counterintuitive, and this can trip you up. "World_from_model" moves positions from model-space to world-space, so it's what you'd normally call "the orientation and position of the model in the world", which you'd instinctively think would be named the other way around (or at least I do). But to do the consistency tricks I'm going to talk about, you absolutely need the word "world" first and "model" second. So just remember that "it's not the way you think" and you'll be fine. I find it helps to think of transforming the vector (0,0,0) - it starts at the origin in model coordinates, and then it ends up in world coordinates at some place, which will be the "position" of the model in the world.\n\n\nSo why is this naming scheme so nice? Because you can immediately check you have consistent maths by checking the "nearby" parts of the names, as shown by these highlights:\n\nVec vertex_@@world = world@@_from_@@model * vertex_model@@;\n\nThis also applies when concatenating transformations, for example if you saw this:\n\nVec vertex_@@world = tank_turret@@_from_@@tank_barrel * tank_body@@_from_@@tank_turret * world@@_from_@@tank_body * vertex_barrel@@;\n\nWAIT AT MINUTE - this is a mess - none of the parts near each other match. Clearly, whoever wrote this got the order of the transformations wrong. What they should have written is this:\n\nVec vertex_@@world = world@@_from_@@tank_body * tank_body@@_from_@@tank_turret * tank_turret@@_from_@@tank_barrel * vertex_barrel@@;\n\n...and those all match nicely. In fact, given this collection of matrices, there is only one valid way to combine them - you can't make a mistake!\n\n\nOf course normally you'd concatenate all the transformations once into a single combined one and then transform all the vertices by the result. You can get sensible names for the intermediate transformations by taking the first part from the leftmost particle, and the last from the rightmost particle. So:\n\nTrans ~XXX_from_YYY = world_from_tank_body * tank_body_from_tank_turret * tank_turret_from_tank_barrel;\n\nFirst part:\n\nTrans @@world@@_from_YYY = @@world@@_from_tank_body * tank_body_from_tank_turret * tank_turret_from_tank_barrel;\n\nLast part:\n\nTrans world_from_@@tank_barrel@@ = world_from_tank_body * tank_body_from_tank_turret * tank_turret_from_@@tank_barrel@@;\n\n...and then of course we'll do:\n\nfor ( i = ...each vertex... )\n vertex_@@world[i] = world@@_from_tank_@@barrel * vertex_barrel@@[i];\n\nEasy and foolproof.\n\n\nThis also helps in reverse. When you need to find a specific transformation, how do you know which transformations to combine, in what order, and do you need to invert any of them? When you invert a transformation, you swap the ends of the name. So say I have the transformations world_from_body and world_from_turret, but I need to know body_from_turret. Well you can't stick the two transformations together either way around and have the parts match - must need to invert one, and taking the inverse of a transformation swaps the ends. But which one to invert? Again, there's only one combination where everything will work:\n\nTrans body_from_world = world_from_body.Invert();\nTrans body_from_turret = body_from_@@world * world@@_from_turret;\n\n...and again notice that the result takes its first part from the leftmost word "body" and the second part from the rightmost word "turret".\n\nSo sticking closely to this sort of naming scheme minimises confusion and means that it's fairly simple to check the maths by checking the grammar.
Moore's Law (even when properly stated!) is a wonderful thing. Code gets faster and faster and it allows us to do more stuff than before, or we can do the same stuff but with less effort. One of the ways we can use less effort is to use higher-level languages, and they let us do the same amount of coding but quicker (time to market) or we can do more of it and handle more complex projects.\n\nThat's all great, but //by how much?// Moore's Law implies a doubling every 18 months or so, but what do we actually get? I'm going to use five systems I know pretty well - the Z80 in the ZX Spectrum, the 68000 in the Atari 520STFM, the Pentium, the ~PowerPC inside the Xbox 360, and the latest Nehalem. Numbers might be approximate in places for various reasons (e.g. my memory is crap), and I'm going to only focus on integer performance to keep the playing field level.\n\nZ80: The [[ZX Spectrum|http://en.wikipedia.org/wiki/ZX_Spectrum]] launched in 1982 with a 3.5MHz [[Zilog Z80|http://en.wikipedia.org/wiki/Z80]], and it was my first computer, and it was pretty awesome for its time. It certainly launched a lot of British coding careers. Each simple instruction needed around 4 clocks to execute. No cache.\n\n68000: [[The Atari ST|http://en.wikipedia.org/wiki/Atari_st]] launched in 1985, though I was a late-adopter and got mine in 1987 (just months before the Amiga A500 came out!). I'm going to call 1986 the first year it was cheap enough to become common with the launch of the first real "games machine" version - the 520STFM. It had an 8MHz [[Motorola 68000|http://en.wikipedia.org/wiki/Motorola_68000]], and each simple instruction also took 4 clocks to execute. No cache.\n\nPentium ~P54C: I remember consciously buying my first PC with a 386SX25 and at some point I had a 486DX66 (I remember because the turbo button toggled the stupid 7-segment display on the front panel between 66 and 8 - how cheesy!), but my memory of specific machines is hazy after that as I changed them so often at both work and home, but at some point I clearly must have owned a Pentium 1 of some description. However, the reason I pick the [[P54C|http://en.wikipedia.org/wiki/P5_%28microprocessor%29#P54C]] is that's what we based the first Larrabee core on, and in terms of integer performance we didn't change much apart from frequency, so I know it well. It can issue two simple instructions (load, store, arithmetic) per clock, and I'm going to choose the first 75MHz version as the data point. It was released in 1993, but only widely available in 1994. These had a built-in L1$, but most motherboards did not have the optional external L2$. I have been informed that memory latency was around 15 clocks on most motherboards, though I never measured it myself.\n\nXenon: the [[360's PowerPC|http://en.wikipedia.org/wiki/Xenon_%28processor%29]] is not a great core - I think it would have run twice as fast if they'd designed it to be clocked at half the speed. But for these purposes I'm going to ignore my gripes with it and (like all the other cores here) focus on reasonable peak throughput. Released in late 2005, it's dual-issue at 3.2GHz. It had an L1$ and L2$, and main memory latency was around 500 clocks (rather high even for the time because all memory requests had to go via the GPU, and there was also some encryption hardware adding a bit of latency in there).\n\nNehalem: the latest released Intel CPU architecture (Westmere is a shrink, so not an //architecture// as such, and Sandy Bridge is not yet officially released), [[Nehalem|http://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29]] is capable of sustaining 4-wide execution of simple instructions (to compare properly with all the other cores in this list). Is has L1$, L2$, and a big shared L3$. The data from a helpful [[Anandtech test|http://www.anandtech.com/show/2542/5]] says a 2008 Nehalem clocked at 2.93GHz has a main memory latency of 46.9ns (or 137 core clocks), and if you think that number is really low compared to the Xbox, you're right - keeping it low was a big focus for that team.\n\nOK, so where did that trip down memory lane get us?\n\n{{{\nName Year MHz CPI MIPS Mem latency Latency in ns\nZ80 1982 3.5 4 0.9 4 1143\n68k 1985 8 4 2 4 500\nP54C 1994 75 0.5 150 15 200\nXenon 2005 3200 0.5 6400 500 156\nNHM 2008 2930 0.25 11720 137 47\n}}}\n(~MHz = clock speed, CPI = Clocks Per Instruction, MIPS = Million Instructions Per Second)\n\nWe basically got 10,000 more instructions per second, but memory latency only improved by about 25. So that's about a 1.44x growth in instruction speed per year, but only a 1.13x growth in memory speed. Practical processing power has grown even faster because of things like having more bits, floating point, SIMD, etc but ignore those for now.\n\nWe should be able to use all that power to help us write more complex programs, not just run the same programs faster. But just how far can we go with the languages? I took two common operations - member access of an object, and a function call, and looked at how many instructions and memory accesses that were likely to miss the cache it took to do each of them in three different languages - plain old C (standing in for assembly on the oldest two ~CPUs), C++ with multiple inheritance, and a duck-typed language such as Python. I came up with the following:\n\n{{{\n Instructions Memory misses\nMember access\nC 1 1\nC++ MI 4 2\nDuck-type 20 3\n\nFunction call\nC 5 1\nC++ MI 8 2\nDuck-type 30 3\n}}}\n\nThe instruction counts for the duck-typing are basically a guess, but it turns out they don't affect the answer that much anyway. I'm also not sure if there's 3 dependent memory accesses or 4 for duck-typing - I couldn't actually find a low-level description of what the common operations are. Saying "it's a hash table" doesn't narrow it down all that much. If anyone does actually know what specific algorithms are used in common duck-type languages (and I ignored the fact that most of them are bytecode being run through an interpreter rather than native code), let me know what the numbers should be.\n\nI assumed the "language of choice" on the first three ~CPUs was C, on the Xenon it's C++ and on Nehalem it's duck-typing. So a table showing thousands of the operation in question per second, and the speedup relative to the Z80:\n\n{{{\n Member access Function call\nZ80 (C) 440 1.0 150 1.0\n68k (C) 1000 2.3 330 2.3\nP54C (C) 4800 11.1 4300 29.4\nXenon (C++) 3200 7.3 3200 21.9\nNHM (DT) 7000 16.1 7000 48.0\n}}}\n\nSo from Z80 to ~P54C we got 11x the perf in member access and 30x the perf in function calls sticking with the same language. We can clearly see that using multiple inheritance on the Xenon is a bad idea - it has pretty large memory latency and it suffered. According to these (admittedly rather pessimistic) numbers, absolute performance actually decreased from using plain C on the Pentium, and that reflects what a lot of people actually see if you don't watch what you're doing. But the Nehalem has much better memory latency, so that should save us, right? Well, if you switch to duck-typed languages at the same time, it doesn't actually help much. Function calls are a bit faster because the overhead of the actual call itself (pushing stuff onto the stack etc) took significant time on the older ~CPUs, but now the 4-wide Nehalem just gobbles them up. What it can't get rid of are the extra indirections, and it means there's no significant difference between accessing a member and calling a member function!\n\nMore interestingly, this implies that performance has barely moved going from C on a Pentium to Python on a Nehalem. Is it really true that that taking code you wrote in 1994 and porting it to Python means it will run at basically the same speed? No, of course it's not that bad - I've clearly approximated wildly for simplicity and to make my point. Nehalem has some gigantic caches on it - its combined L1 caches are as large as the entire addressable memory on a Z80. The L3$ is 2Mbytes per core and its distributed so if you have a four-core chip it does actually have 8Mbytes total. A lot of those memory accesses will manage to hit in the L3$, and that's only 39 cycles away.\n\nBut nevertheless the point is still there. Memory speeds are scaling very slowly compared to our common perception of processing speed, and even though we all know this, we sometimes lose track of the relative magnitudes, assuming that the new features are only a slight performance hit. The recent "Data Oriented Design" movement from folks like Mike Acton is basically trying to remind people that all this extra performance is contingent on caches working well, and that even code we think of as a minor processing load can start becoming a significant performance hog if it misses the cache all the time. These chicken-scratch numbers show that the sophisticated language features are basically becoming mainstream as fast as processing power can increase.\n\n(warning - extreme speculation ahead!) That might not be too surprising though if the size of the job is limited by other things. Like, say, the amount of complexity that the player's brain can cope with in a minute. Maybe what's happening is that the amount of gameplay-related code happening per minute is asymptotically approaching these various limits, and as it does we find that we can use this extra power to write that code in higher-level languages and increase productivity, iteration speed and robustness. That's not a bad thing in and of itself when confined to code that is inherently limited by the human on the other end of the controller (and thus not following Moore's Law), but it does cause problems as other areas of code adopt these languages, either because of coder choice (the languages are more productive) or because of interoperability with the gameplay code (as opposed to using two different languages duct-taped to each other with something like SWIG).\n\nThat's when Acton and the DOD folk get ranty, as it can be very difficult to recover this performance. It's not performance tuning as we used to understand it - focused in the traditional "kernel" areas of high number crunching (graphics, animation, sound, raycasting, etc), or even in certain "hot functions" that you can see in a profiler and sit and stare at for a week. It's all over the code - ever function call, every member access. None of it shows up on well on traditional profiling tools, and because it's all to do with how nice you are to caches, which are unpredictable and temperamental beasts, its incredibly sensitive to both Heisenburg and Butterfly effects.\n\nThe traditional method of games production is to write the thing, run it at low rez or with simple assets, pretty much ignore the fact that the thing runs slowly because 90% of that time is the unoptimized number-crunching kernels, and assume that in the last quarter or so the performance ninjas will optimize the hell out of them and quadruple the speed. Instead what's happening is that during development of the game only about half the execution time is spent in kernels, and when the ninjas come in and solve that half you only get a modest framerate boost, a profile as flat as a pancake, no idea where to even start looking for perf, and now the ninjas are screaming at some sobbing level designer about the speed of his Python hash table lookups.
Volker Schoenefeld took pity on my pathetic maths knowledge. I used to know all this stuff back in university, but it's atrophied. He sent me some links to some excellent tutorials and papers he's done on SH and the maths behind them. Probably the best place to start is his paper at [[http://heim.c-otto.de/~volker/prosem_paper.pdf|http://heim.c-otto.de/~volker/prosem_paper.pdf]]. He also has a Flipcode entry about it: [[http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=65274|http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=65274]] and the source to go with it [[http://heim.c-otto.de/~volker/demo.tgz|http://heim.c-otto.de/~volker/demo.tgz]]. Many thanks to you, Volker.
Well, it never rains but it pours. OK, not a true reinvention this time, but a sort of co-invention. Sony just announced their EDGE libraries recently and in them was a snazzy new vertex-cache optimiser, and some people at GDC asked me whether it was my algorithm. I had no idea - haven't written anything for the ~PS3 for ages, but I knew Dave Etherton of Rockstar ported [[my algorithm|Vertex Cache Optimisation]] to the ~PS3 with some really good results (he says 20% faster, but I think that's compared to whatever random ordering you get out of Max/Maya).\n\nAnyway, the EDGE one is actually a version of the [[K-Cache algorithm|http://www.ecse.rpi.edu/~lin/K-Cache-Reorder/]] tweaked and refined by [[Jason Hughes|http://www.dirtybitsoftware.com/]] for the particular quirky features of the post-transform cache of the RSX (the ~PS3's graphics chip). Turns out we do very similar broad-scale algorithms, but with slightly different focus - they tune for a particular cache, I try for generality across a range of hardware. It would be academically interesting to compare and contrast the two. However, they both probably achieve close enough to the ideal ACMR to ensure that vertex transformation is not going to be the bottleneck, which is all you really need.\n\nRemember - after ordering your triangles with vertex-cache optimisation, you then want to reorder your vertex buffer so that you're using vertices in a roughly linear order. This is extremely simple - it's exactly the algorithm you'd expect (scan through the tris, first time you use a vert, add it to the VB) but it's as much of a win in some cases as doing the triangle reordering.
Mucky Foot Productions Ltd - a games company in Guildford, a few miles south west of London. Guildford used to be the heart of UK indie games development, but it's all gone a bit quiet now. I worked there from late 1999 through late 2003, when it all went pear-shaped and the company folded. Worked on some fun games.\n\nI joined in the closing stages of [[Urban Chaos]] (PC, ~PS1), and then did the job of porting it to the Dreamcast (which was a fabulous machine to work on). A fun game, if a little random in places. But full of character. To my knowledge, Darci is still the gaming world's only female black cop protagonist.\n\nThen I joined the StarTopia (PC) team and took over the graphics engine, which I still play with today (http://www.eelpi.gotdns.org/startopia/startopia.html). It's a fantastic "little computer people" game - available for about $3 on Amazon these days.\n\nThen we did [[Blade II]] (Xbox1, ~PS2) - I was mainly the Xbox1 engine side of things. Not a great game, but I'm pleased with the way the graphics turned out - I managed to put in a lot of nice tech such as auto-generated normal maps, sliding-window VIPM for most meshes, and almost every asset was streamed from disk (meshes, textures, sounds, physics hulls).\n\nAfter that we started two games in parallel with a new cross-platform engine which I headed up, but we only got a year or so into development before everything collapsed. Bit of a shame really - the engine was shaping up fairly well. Over the years I've tried to document some of the techniques used - click on the Streaming and VIPM tags in the right-hand menu.
I will shortly be leaving Intel. Working on the architecture known variously as SMAC, Larrabee, MIC, Knights, and now Xeon Phi has been enormously rewarding - there is absolutely no other company on this planet that can take a few people's ideas and use them to change computing to such an extent. But I've discovered I'm really not a big-company sort of person, and seven years on one project is a personal record for me, so it's time to move on. And when Michael Abrash asks if you want to come work on [[wearable computing|http://blogs.valvesoftware.com/abrash/valve-how-i-got-here-what-its-like-and-what-im-doing-2/]] with him, there's really only one correct answer. It's the same answer I gave seven years ago when he asked me if I had any ideas about new x86 instructions...\n\nI am immensely proud of what we achieved (and almost achieved) on Larrabee, and I think the architecture is clearly the blueprint for the future. Knights Corner is a great chip, and I'm really excited about what Intel will be able to show everyone when it starts shipping. I'm sad not to be able to be a part of that any more, but I know the team will go on to refine the architecture, and also proliferate the concepts further into general-purpose computing. The next few years are going to be really exciting as the different architectures battle it out.\n\nIn one way I'm sad to be leaving the world of hardware and instruction set design. But the stakes are high, and the pressure is too. I'm looking forward to being able to go back to what I was before - a games coder, cranking out pixels and maxing out the hardware you're given. There's significantly fewer 7am meetings for a start.
Look, please try to remember, my surname ''DOES NOT HAVE AN E IN IT''. There are approximately three times as many Forsyths in the London phone directory as Forsythes, so it's not exactly uncommon. I wouldn't mind as much if people didn't add it when I'm //spelling// my name out for them. I'm saying "eff oh are ess why tee aych........." and they write an "e" and then ask me "oh, no E?" Well - there was that big silent bit where I could have easily said "E". I didn't suddenly forget or anything. Now you're going to have to write that cheque/receipt/permit out for me again, aren't you?
[[http://notepad-plus.sourceforge.net/|http://notepad-plus.sourceforge.net/]]. Like notepad, but better. Does word-wrap right, has a bunch of neat bracket-matching options, and a whole bunch of useful stuff I haven't played with yet. Launches in an instant. It doesn't replace a good custom text editor, but it's ace for hacking text files in arbitrary languages (XML, etc) when you don't have a spare few hours for Visual Studio to start up.\n\nAnd no, I'm not going to use emacs. Perverts.
I just know I'm going to get endless emails on the pedantry of this subject, but here's my brief round up of the background. You don't need to know any of this, it's just for geeky interest.\n\nThere's the [[Nyquist rate|http://en.wikipedia.org/wiki/Nyquist_rate]]. This has two meanings, but the interesting one is that if you have a limited bandwidth medium, you absolutely can't represent a signal of a frequency more than twice that of the medium. Put into graphics terms, you only have a certain number of pixels on the screen - that's your limited bandwidth medium. So you absolutely can't represent a signal (the texture) that is more than twice that size - there's no point even trying.\n\nThere's also the [[Nyquist frequency|http://en.wikipedia.org/wiki/Nyquist_frequency]]. It says a subtly different thing, which is that if you want to represent a signal of a certain frequency, you have to sample it at twice that frequency to avoid aliasing. However, this is not actually that relevant to texturing, because generally our signal is real life, which is much higher than can be achieved on a pixel display. So we're concerned with how to give a reasonable representation of a signal that is far beyond the capabilities of our hardware, not how to choose hardware to completely represent a signal.\n\nSome people also talk about the "Nyquist limit", but it appears to be an informal term. They are usually referring to how much information you can pump through a system without it looking bad. This is a different thing to the two above, which talk about theoretical limits, not practical ones. As ever in game graphics, the academia can give you a guide and set limits, but in the real-world we tend to make large assumptions and go with what looks good. But it's nice to know in this case that there is a real basis for the approximations that seem to work fine in practice.
A handy list of rules that I use to instantly decide if someone is an idiot coder or not. The way I decide this is that I tell them the list of rules, and if they're offended, then they're an idiot. If they give me useful arguments against any of my rules then they're not idiots, they are thoughtful but misguided. If they agree with all the rules instantly, they're sheep. If they argue but then agree, they are now true disciples. It's a very handy classification system, since it completely eliminates the possibility of people who might be smarter than me.\n\nThis list will grow with time. The chance of anybody agreeing with the entire list therefore tends towards zero. Thus the number of idiots in the world grows with every update. Fear the blog.\n\n* Double precision has no place in games. If you think you need double precision, you either need 64-bit fixed point, or you don't understand the algorithm. [[A matter of precision]]\n* Euler angles are evil, unless you're actually modelling a tank turret. And even then they're evil, they just happen to be an evil hard-wired into every tank turret. Try not to spread TEH EV1L further than strictly necessary.\n* Angles in general are evil. Any time you feel the urge to use angles, a vector formulation is likely to be more robust and have fewer transcendentals or branches.\n* If you can't write the assembly for the code you're writing, you're a danger to the project and other coders. Try something a bit more low-level until you do understand what you're writing.\n* If you think you can remember precendence rules beyond +-*/, you're only 99% right. Also, nobody else can remember them, so your code is now 1% buggy, and 100% unmaintainable. Use brackets.\n* OpenGL is retarded. DirectX is flaky, transient, confusing and politically-motivated, but it's not retarded.\n* [[Scene Graphs|Scene Graphs - just say no]] are useless and complex. Occam's Razor makes mincemeat of them.
We all know and love loose octrees, right? Really useful things. Have been for used for around ten years in games - about twice that long in raytracers. Well, some doofi at Intel have patented them. [[http://www.google.com/patents?id=sH54AAAAEBAJ&dq=7002571|http://www.google.com/patents?id=sH54AAAAEBAJ&dq=7002571]] Filed in 2002. Not only is this well after they were in common use, but it's two years after the publication of Game Programming Gems 1 where [[Thatcher Ulrich|http://www.tulrich.com/geekstuff/index.html]] wrote the definitive article on them. The title of the article is the fairly obvious one - "Loose Octrees". Not exactly difficult to google for, should you be investigating a patent application on the subject, and indeed the top page found is Thatcher's. Understandably, he's not that thrilled about this either: [[9th May 2007 entry|http://www.tulrich.com/rants.html]]\n\nNormally I simply avoid reading about patents entirely, and I try not to pollute anyone else's brains with them because of the incredible abuses they are put to in this utterly broken system. But in this case it's too late - you already violated this one. You thought you were safe didn't you? Just because it was invented before most of us started programming, just because loads of people have used it in tons of shipped code, just because it's been discussed ad nauseam all over the internet, and just because it's been several years since actual publication in a popular hardback book - none of that stops some idiots from patenting the existing contents of your brain. But that's fine - the sooner someone gets sued by Intel for violation, the sooner the patent can be revoked //from orbit// for gratuitous and wanton disregard for prior art and obviousness. As Kevin Flynn put it - "the slimes didn't even change the name, man!".
Somebody asked for the slides to Michael Abrash's Pixomatic GDC 2004 talk the other day, and we at RadGameTools realised that they weren't online anywhere. So [[here they are|papers/pixomatic_gdc2004.ppt]] in a somewhat temporary place until we get them on the [[main site|http://www.radgametools.com/pixomain.htm]].\n\nSomeone also pointed out that the Dr. Dobbs Journal articles on Pixomatic are also available: [[part1|http://www.ddj.com/184405765]] [[part2|http://www.ddj.com/184405807]] [[part3|http://www.ddj.com/184405848]]. A fascinating journey into the world of the software renderer. Even in today's GPU-dominated world, it's not dead yet - and it's getting better.
A nice chap called Benedict Carter has been working on a fun zombies-vs-marines games called "Plague" - check the blog here: [[http://plague-like.blogspot.com/|http://plague-like.blogspot.com/]], and there's a download link there as well - you can get source or just a Windows EXE. Handy hint - the concussion grenades are really powerful, but have a huge range and you can easily kill yourself with them if you're not careful - stick with the incendiaries to start with (press the "9" key at the start of the game!). And before you start moaning it gets slow when you lob a bunch of molotovs around and set fire to huge forests - it's written in Python, and it's a fun home project, so shush.\n\nAnyway, why do I mention this? Because he got some inspiration for the way fire spreads from an old article on [["Cellular Automata for Physical Modelling"|http://www.eelpi.gotdns.org/papers/cellular_automata_for_physical_modelling.html]] that I wrote a while back, and was kind enough to tell me about it. I'm still not super happy with the article, and particularly the way it conflates "temperature" with "heat energy" - they're not the same thing at all of course. But it works after a fashion, and it's fairly explicit about how you need to balance having just enough realism to create the effect you want, without too much realism that it's hard to tweak or too slow to run. Game physics is not like real physics and should not be!\n\nSo, fun to see some old ideas made into an actual enjoyable game. And of course always fun to see what people get up to in their spare time. My brain has been so full of the SuperSecretProject for so long, my outside projects have basically stopped - hence no blog posts for so long. But the veil of secrecy is slowly lifting, so hopefully I'll be able to talk more about that stuff in a bit.
Everyone should have something like this in a header file in every project. It just makes life simpler:\n\n{{{\ntypedef signed char sint8;\ntypedef unsigned char uint8;\ntypedef signed short sint16;\ntypedef unsigned short uint16;\ntypedef signed long sint32;\ntypedef unsigned long uint32;\ntypedef signed __int64 sint64;\ntypedef unsigned __int64 uint64;\n}}}\n\n...and you have one of these for each platform and/or compiler of course. Note the pedantic use of "signed" and "unsigned" because the default is not that well-defined for things like chars.\n\nYou then use "int" when you need something fast with a poorly-specified number of bits, but otherwise use one of the above. If you have data you're going to save and load, you ONLY use data types that have specified bit sizes. Anything else is just asking for trouble and woe. There's also a reasonable justification for having bool8 and bool32, since the size of "bool" is also not well-defined. Some people also define something like "Bool" to cope with older C-only compilers that don't support "bool" natively at all (many used to semi-officially support "BOOL").\n\nYou can also do the same with float32 and float64, though there's far less confusion there - "float" and "double" are very consistent across platforms.\n\nEdit: Reaven Khali pointed me to a BOOST header file that does most of this for you: [[boost/cstdint.hpp|http://www.boost.org/doc/libs/1_38_0/libs/integer/cstdint.htm]]. I don't personally like the use of _t at the end, or making it implicit that ints are signed, but you can very easily typedef the names again to your own preference (some people prefer even shorter names like s8, u16, etc), but still leave the BOOST header to figure out all the compiler config problems for you.\n\nEdit 2: Mark Baker (hi Mark!) pointed out that C99 has the same header as a standard lib file: [[Wikipedia entry on stdint.h|http://en.wikipedia.org/wiki/Stdint.h]]. They appear to use the same naming convention (not an accident, I'm sure). I don't know the pros and cons of using <stdint.h> version versus the BOOST version, and //I don't want people to email me about it// because I am long past caring about code-standard wars (especially not the ~BOOST-is-awesome vs ~BOOST-is-the-devil one). I'm just writing down the options.\n\nEdit 3: Sean Barrett solves problems. He solved this one as well: [[sophist.h|http://nothings.org/stb/sophist.h]]. Brian Hook also solves problems: [[POSH|http://www.bookofhook.com/poshlib/posh_8h.html]]\n\nThe most important thing is your code uses some sort of reasonably-named standard and isn't just littered with "unsigned int" that you have to struggle to search-and-replace later. Exactly what that convention is and what header you use to define it is easily changed later as you change platforms. So don't stop to think - just cut'n'paste the top snippet into your code RIGHT NOW and start using the types, and you can go investigate changing it later.
What we think of as conventional alpha-blending is basically wrong. The SRCALPHA:INVSRCALPHA style blending where you do FB.rgb = texel.rgb * texel.a + FB.rgb * (1-texel.a) is easy to think about, because the alpha channel blends between the texel colour (or whatever computed colour comes out the pixel shader - you know what I mean) and the current contents of the framebuffer. It's logical and intuitive. It's also rubbish.\n\nWhat you should use instead is premultiplied alpha. In DX parlance it's ONE:INVSRCALPHA, or FB.rgb = texel.rgb + FB.rgb * (1-texel.a) (and the same thing happens in the alpha channel). Notice how the texel colour is NOT multiplied by the texel alpha. It's a seemingly trivial change, but it's actually pretty fundamental. There's a bunch of reasons why, which I'll go into, but I should first mention that of course I didn't think of this and nor is it new - it's all in a 1984 paper [[T. Porter & T. Duff - Compositing Digital Images|http://keithp.com/~keithp/porterduff/]]. But twenty years later, most people are still doing it all wrong. So I thought I'd try to make the world a better place or something.\n\nAnyway, Porter & Duff talk about a lot of things in the paper so you might not want to read the whole thing right now, but the fundamentals of premultiplied alpha are easy. "Normal" alpha-blending munges together two physically separate effects - the amount of light this layer of rendering lets through from behind, and the amount of light this layer adds to the image. Instead, it keeps the two related - you can't have a surface that adds a bunch of light to a scene without it also masking off what is behind it. Physically, this makes no sense - just because you add a bunch of photons into a scene, doesn't mean all the other photons are blocked. Premultiplied alpha fixes this by keeping the two concepts separate - the blocking is done by the alpha channel, the addition of light is done by the colour channel. This is not just a neat concept, it's really useful in practice.\n\n''Better compression''\nConsider a semi-translucent texture with a variety of colours in it. With conventional blending, the colour of alpha=0 texels is irrelevant you think - because hey - they get multiplied by alpha before being rendered. But it's not true - consider when the bilinear filtering is half-way between RGBA texels (1,0,0,1 = solid red) and (0,0,1,0 = transparent blue). The alpha value is obviously interpolated to 0.5, but the colour is also interpolated to (0.5, 0, 0.5), which is a rather ugly shade of purple. And this will be rendered with 50% translucency. So the colours of texels with alpha=0 are important, because they can be "pulled in" by the filtering.\n\nThe answer with normal alpha-blending is to do a flood-fill outwards, where texels with non-zero alpha copy their colours to texels with zero alpha. This is a pain to write though, and of course some alpha=0 texels have multiple neighbouring texels, and they have to do an average of them or something heuristic like that.\n\nThen you go to encode it as a DXTn texture, and you hit more problems. First of all, DXT1 can encode alpha=0 texels, but it can't encode them with any colour other than black. So all your flood-filling was pointless, and black is going to bleed in. I've actually seen this in shipping games - the edges of translucent stuff is bizarrely dark. For example, you get a brightly-lit green leaf rendered against a bright blue sky, so the result of any blending between them should be bright, but there's a dim "halo" around the leaves. Silly developer!\n\nNow OK, DXT1 isn't all that useful for alpha textures because although the top-level mipmap might only have alpha=0 or alpha=1, as soon as you start to make mipmap levels you need some intermediate levels. So let's try DXT3 or DXT5. Same problems with both of those. First of all, the bleeding of course, so fine, you write a flood-filler, whatever. The next problem is when you come to compress the texture. Your flood filler has made sure neighbouring texels are something like the solid ones, but they will be different, especially if the flood-filler has had to blend them. But DXTn compression does badly as the number of colours increases, and the flood-filler just added texel colours. What's worse, they're not particular significant colours (as mostly they're invisible, it's just the "halo" effect you're trying to prevent). But it's hard to tell most DXTn compressors about this, so they see a bunch of texels that have very different colours, and try to satisfy all of them. What happens visually is that you get significantly worse compression around the translucent edges of textures than in the opaque middle, which is very counter-intuitive.\n\nOK Mr. Smarty-pants, what's the answer? Well, premultiplied alpha. All you do is before you compress, you multiply the colour channel by the alpha channel, and during rendering change blend mode from SRCALPHA:INVSRCALPHA to ONE:INVSRCALPHA. That's it - it's not rocket-science.\n\nSo what now happens is all the alpha=0 texels are now black. Wait - but you'll get bleeding and halos! No, you don't. Let's do the half-texel example again. Let's say we have an entirely red texture, but with some bits alpha'd, and we render onto a green background. You'd expect to get shades of red, green and yellow - but the darkest yellow should be around (0.5,0.5,0) - no dark halos of something like (0.25,0.25,0), right?. So (1,0,0,1 = solid red) and (1,0,0,0 = transparent red). The second one gets premultiplied before compression to (0,0,0,0). Now we bilinear filter between them and get (0.5, 0, 0, 0.5). And then we render onto bright green (0,1,0).\n\nFB.rgb = texel.rgb + (1-texel.a) * FB.rgb\n= (0.5, 0, 0) + (1-0.5) * (0,1,0)\n= (0.5, 0, 0) + (0, 0.5, 0)\n= (0.5, 0.5, 0)\nwhich is exactly what we were expecting. No dark halos. And since when alpha=0, the colour we WANT to encode is black, DXT1 does exactly the right thing. Lucky that, eh? Well no, not really - DXT1 is ''meant'' to be used with premultiplied alpha (even though the docs don't say this).\n\nBetter still, when you encode the texture into DXT3, all the edge texels are black, whatever their neighbours are. So that's just a single colour for the encoder to fit, rather than a bunch of them. In general, this leads to more consistent compression quality for translucent textures.\n\nBy the way - DirectX has the two "premultiplied alpha" compressed texture formats DXT2 and DXT4. I'm not sure why they added them - they are rendered identically to DXT3 and DXT5. The fact that they're different formats is meant to be used as meta-data by the application using them. Thing is - you usually need far far more meta-data than that - is the texture a normal map, does the alpha channel hold translucency or is it a specular map instead, etc. Seems pointless to me. It looks like they're going away in DX10 which is good (DXT2/3 become BC2, DXT4/5 become BC3). In general I'd avoid using DXT2 and DXT4 - I have seen (early) drivers that did something strange when you used them - I think it was actually trying to do the multiply for me, which is not what I wanted. Stick to DXT3 and DXT5 for sanity.\n\n''Compositing translucent layers''\nSome rendering techniques want to composite two translucent layers together, and then render the resulting image onto a third. The common one I've seen (and used) is for impostors. This is where you take a high-polygon object in the distance and render it to a texture once. Then, as long as the camera doesn't move too much, you can just keep drawing the texture as a billboard, without re-rendering the object itself and burning all that vertex & pixel shader power on a visually small object.\n\nIt's rather more complex than that, but the fundamental point is that you're doing a render to a texture of an image, then rendering that to the screen. The problem comes when the object itself has multiple translucent textures that might overlap. So the effect you're looking for is as if you did a standard render - background, then translucent tex1, then translucent tex2. The problem is, what you're actually doing is rendering tex1 then tex2 to an intermediate, then rendering that onto the background. Let's do some maths. Assuming normal alpha-blending, what we want is:\n\nRender tex1:\nfb2.rgb = (tex1.rgb * tex1.a) + ((1-tex1.a) * fb1.rgb)\nRender tex2:\nfb3.rgb = (tex2.rgb * tex2.a) + ((1-tex2.a) * fb2.rgb)\n= (tex2.rgb * tex2.a) + ((1-tex2.a) * ((tex1.rgb * tex1.a) + ((1-tex1.a) * fb1.rgb)))\n\nSo now we try to emulate this with our impostoring:\n\nRender tex1:\nimpostor1.rgb = tex1.rgb\nimpostor1.a = (1-tex1.a)\nRender tex2:\nimpostor2.rgb = (tex2.rgb * tex2.a) + ((1-tex2.a) * impostor1.rgb)\nimpostor2.a = well ... er ... that's the question - what do I do with the alpha channel? Add? Mutiply?\nRender impostor to FB:\nfb2.rgb = (imposter2.rgb * imposter2.a) + ((1-imposter2.a) * fb1.rgb)\n\n...and you'll find that whatever blend modes you choose, you basically get a mess.\n\nBut if we use premultiplied alpha everywhere, it's a doddle. What we want is:\n\nRender tex1:\nfb2.rgb = (tex1.rgb) + ((1-tex1.a) * fb1.rgb)\nRender tex2:\nfb3.rgb = (tex2.rgb) + ((1-tex2.a) * fb2.rgb)\n= (tex2.rgb) + ((1-tex2.a) * ((tex1.rgb) + ((1-tex1.a) * fb1.rgb)))\n= (tex2.rgb) + ((1-tex2.a) * (tex1.rgb)) + ((1-tex2.a) * (1-tex1.a) * fb1.rgb))\n\nAnd we're actually doing:\n\nRender tex1 (impostor starts at (0,0,0,0):\nimpostor1.rgb = (tex1.rgb) + ((1-tex1.a) * (0,0,0))\nimpostor1.a = (tex1.a) + ((1-tex1.a) * 0)\nRender tex2:\nimpostor2.rgb = (tex2.rgb) + ((1-tex2.a) * impostor1.rgb)\n= (tex2.rgb) + ((1-tex2.a) * tex1.rgb)\nimpostor2.a = (tex2.a) + ((1-tex2.a) * (impostor1.a))\n= (tex2.a) + ((1-tex2.a) * (tex1.a))\nRender impostor to FB:\nfb2.rgb = (imposter2.rgb) + ((1-imposter2.a) * fb1.rgb)\n= (tex2.rgb) + ((1-tex2.a) * tex1.rgb) + ((1-((tex2.a) + ((1-tex2.a) * (tex1.a)))) * fb1.rgb)\n\nThat scary-looking (1-((tex2.a) + ((1-tex2.a) * (tex1.a)))) bit looks horrendous but actually reduces to a nice simple ((1-tex1.a) * (1-tex2.a)), so the final result is:\nfb2.rgb = (tex2.rgb) + ((1-tex2.a) * tex1.rgb) + ((1-tex1.a) * (1-tex2.a) * fb1.rgb)\n...which is exactly what we want - yay! So you can see that premultiplied-alpha-blending is associative, i.e. ((a @ b) @ c) is the same as (a @ (b @ c)) (where @=the alpha-blend operation), which is incredibly useful for all sorts of image compositing operations.\n\n''Multipass lighting techniques''\nA common thing in rendering is to render an object multiple times, once per light. This means you can do shadowing with things like stencil volume shadows that can only deal with a single light per pass. The problem comes with translucent textures. You still want them to be shadowed, but the math doesn't work:\n\nWhat we want is:\nfb2.rgb = ((tex.rgb * (light1.rgb + light2.rgb))*tex.a) + ((1-tex.a) * fb1.rgb)\n\nBut pass 1:\nfb2.rgb = (tex.rgb * light1.rgb * tex.a) + ((1-tex.a) * fb1.rgb)\nPass 2:\nfb3.rgb = (tex.rgb * light2.rgb * tex.a) + ((1-tex.a) * fb2.rgb)\n= (tex.rgb * light2.rgb * tex.a) + ((1-tex.a) * ((tex.rgb * light1.rgb * tex.a) + ((1-tex.a) * fb1.rgb)))\nOh dear, that's a mess. Let's try using additive blending for the second pass instead. After all, that's what we'd use for opaque objects.\nfb3.rgb = (tex.rgb * light2.rgb) + (fb2.rgb)\n= (tex.rgb * light2.rgb) + ((tex.rgb * light1.rgb * tex.a) + ((1-tex.a) * fb1.rgb))\n= (tex.rgb * (light2.rgb + (light1.rgb * tex.a))) + ((1-tex.a) * fb1.rgb))\n\n...which is closer, but still not right. OK, let's try with premultiplied alpha. What we want is:\nfb2.rgb = (tex.rgb * (light1.rgb + light2.rgb)) + ((1-tex.a) * fb1.rgb)\n\nPass 1:\nfb2.rgb = (tex.rgb * light1.rgb) + ((1-tex.a) * fb1.rgb)\nPass 2 uses additive blending, just like opaque objects:\nfb3.rgb = (tex.rgb * light2.rgb) + (fb2.rgb)\n= (tex.rgb * light2.rgb) + ((tex.rgb * light1.rgb) + ((1-tex.a) * fb1.rgb))\n= (tex.rgb * (light1.rgb + light2.rgb)) + ((1-tex.a) * fb1.rgb))\n\nMagic! Again, it actually makes perfect sense if you think about each pass and ask how much light each one lets through, and how much light each adds, and then realise that actually there's three passes - the first one occludes the background by some amount, the second adds light1, the third adds light2. And all we do is combine the first two by using the ONE:INVSRCALPHA blending op.\n\n''Additive and blended alpha in a single operation''\nIn some places in your game, you want "lerp" materials - standard alpha-blending ones that occlude the background to some extent and add their own colour in to an extent - for example, smoke particle systems are like this. In other places, you just want to do an additive blend without occluding the background - for example a fire particle system. So we have two very common blending modes usually called "add" and "lerp" or somesuch. But what if you want a material with bits of both? What if you want a single particle system that has additive flame particles turning into sooty lerping particles as they age? You can't change renderstate in the middle of a particle system, that's silly. Who can help us now? Why - it's Premultiplied Alpha Man - thank god you're here!\n\nBecause of course if you're using premultiplied alpha, what happens if you just set your alpha value to 0 without changing your RGB value? In normal lerp blending, what you get is a fully translucent surface that has no effect on the FB at all. In premultiplied alpha, you... er ... oh, it's additive blending! Blimey, that's a neat trick. So you can have particles change from additive to lerp as they get older - all you do is change the alpha value from 0 and the texture colour from a firey red/yellow colour towards an alpha of 1 and a dark sooty colour. No renderstate changes needed, and it's a nice smooth transition. The same trick works for mesh textures where you want both glows and opaque stuff - one example is a neon sign where the actual tubes themselves and things like the support brackets or whatever are opaque, but the (faked) bloom effect from the tubes is additive - you can do both in a single pass.\n\n\nSo anyway, yeah, premultiplied alpha. Use it, love it, pass it on. Then maybe we'll only take another 20 years til everyone's doing this stuff correctly.
It's the bit of circuitry on a video card that reads the framebuffer out of the RAM, puts it through a Digital to Analogue Convertor, and feeds it to your monitor. RAM+DAC you see. In the old days when VGA was a card, not a cable standard, that's pretty much all a video card was - some memory and a thing to output it to the monitor. People still use the term RAMDAC to refer to the chip that outputs DVI, even though it doesn't actually have a DAC (Digital to Analogue Convertor) in it.
Now with RSS feeds - which is handy in case I accidentally post something useful.\n\nI'm a total newb on interweb tech, but RSS is fascinating - there appears to be at least five different versions, apart from the one called ATOM which isn't a different version, it's a completely different thing altogether (...and cue Airplane gag). "Open Standards" are such an oxymoron. Precisely because they're open, they're not standard. I mean, if they were standard, you couldn't change them at will, so they wouldn't be open. So the only truly Open Standard is one you just invented that nobody knows about yet.\n\nIf an Open Standard is RFC'd in a forest, but there's no slashdot post, does anyone file a submarine patent on it?
Hmmm... not that impressed by the automagic RSS feed generator on these TiddlyWikis. There's probably lots of cunning ways to tweak it with embedded commands, but I'd have to RTFM. The problem is that every time I touch a page it gets put to the top, even if it's just minor tweaks (e.g. adding tags or corrcting typos), or adding tiddlers that are not directly part of the blog. Also, the "description" field isn't much of a description - it's the entire text of the entry. I'm sure that's not the intent of that field. So I might try doing it by hand for a bit - just add the blog bits myself. The format isn't exactly rocket-science now I have a template.\n\nTell you what, I'll set up both - the really verbose automatic one and a hand-done one that will just have the blog headlines. Then you can choose which you like best. By the way, the RSS feed for this page is the same as the web address but with .xml instead of .html. Firefox does that automagically and gives you a little orange icon to click, but it looks like a lot of other things don't, so I've added a manual link at the left hand side.\n\nIf you're still having problems with the RSS feed, yell. I'm not a big RSSer myself, so I don't really know what standard practice is for that stuff, and I'd like to tweak it now, get it right and then forget about it :-)
In other news - I think RSS is a good idea waiting for a decent implementation. There's two ways to do this - I can make the manually-edited feed have the complete text with the Wiki stuff edited out, but then you can't use the Wiki links, which is kinda the whole point of using a Wiki. Or I can continue to make it just have like a single-line description of the post, and any decent RSS reader should include the link to the full post you can click on if the one-line teaser intrigues you.\n\nSo, both of you out there using RSS, bet now! Bet bet bet bet bet! It's even easier now I added an email address to the menu on the left :-)
http://www.radgametools.com/ Where I used to work before joining Intel. Rad products include [[Bink|http://en.wikipedia.org/wiki/Bink_video]], Miles, Pixomatic and the thing I worked on for two years - [[Granny3D]].
Deano Calver has a neat article comparing ray-tracing and rasterisation. http://www.beyond3d.com/content/articles/94\n\nI basically agree with it - ray-tracing has a limited number of things it's good at, and a large number of things it's very bad at, and a bunch of things people think it's good at that are not raytracing at all and can be applied to either system. The arguments I always have with people are about scalability - how well do the two methods scale as your amount of content rises?\n\nTo put it in a very simplistic way, raytracing takes each pixel and then traverses the database to see what object hit that pixel. Whereas rasterisation takes the database and traverses the screen to see what pixels each object hit. In theory they sound the same, but with the inner and outer loops swapped (pixels vs objects) - should be the same, right?\n\nThe problem is that rasterisation has some very simple ways of not checking every single pixel for each object, e.g. bounding boxes of objects and triangles, and rasterisation algorithms that know what shape triangles are. Raytracing also has a bunch of ways of not checking every object for each pixel, but those methods are not as simple for the same win, and can take time to reconstruct when the scene changes.\n\nThe other fundamental problem is the thing in the inner loop - changing from one pixel to the next is pretty easy for a rasteriser - pixels are pretty simple things, laid out in a nice regular grid. The rasteriser can keep around a lot of information from pixel to pixel. The raytracer on the other hand is stepping through objects in the scene - they're not inherently nicely ordered, and the state of object #1 says nothing about the state of object #2. So logically you'd want objects in your outer loop and pixels in your inner loop.\n\nThat's an absurdly simplistic way of looking at things of course, but it's a handy O(n)-notation way of thinking about it. You can then start digging and find lots of other reasons why one is slightly more efficient than the rest (after all, bubblesorts are quicker than mergesorts in small lists in architectures with high branch penalties), but you still have to work very hard to overcome the ~BigO problem. And of course the real world shows a lot of harsh realities.\n\nWe'll put aside the massive dominance of hardware rasterisers, because it is somewhat of a local-minimisation problem, in that what the hardware companies know how to build is rasterisers, and they don't know how to build raytracers, so that's what they build, and therefore that's what people program for. But in the offline rendering world, where the hardware is the same - general-purpose ~CPUs - the dominant mechanism is still the rasteriser, in the form of the REYES system. It's not your standard rasteriser of course - there's a long way between what it does and what graphics cards currently do - but nevertheless the fundamental principle holds - for each object, they see what pixels it hits. Not the other way round.\n\nRaytracing has its places - it's just not The Future. But then it's possible neither is rasterising - currently neither has a great solution to the lighting problem.
Ignacio Castaño recently looked into some explicit ordering methods for regular grids. [[http://castano.ludicon.com/blog/2009/02/02/optimal-grid-rendering/|http://castano.ludicon.com/blog/2009/02/02/optimal-grid-rendering/]]. It was kinda neat to see that [[my algorithm|http://www.eelpi.gotdns.org/papers/fast_vert_cache_opt.html]] does really well considering it's cache-size-blind. I'm not quite sure why the weakness at size=12, but when writing it I did observe that on large regular grids it would get a bit degenerate and start to do long double-wide strips. Real game meshes have sections of regular grid (e.g. 10x10 chunks), but those are small enough that the algorithm quickly hits discontinuities that kick it out of the double-wide mode and into the more Hilbert-like mode.\n\nI have thought about removing what I think is the algorithm's biggest weakness - the first few triangles, when it doesn't have any data in the cache and it's basically making random choices. So my theory was - run the algorithm until half the mesh is done. Then throw the first 100 tris back in the "available" pool and run until the end.\n\nThe fun thing is that when I originally wrote the algorithm, it just needed to be ~OKish, it didn't need to be good. The real priority was it had to be fast and space-efficient, because it was running as part of the Granny3D export pipeline from Max/Maya, and any sort of half-decent ordering was better than the essentially random ordering we were getting. It was a nice accident that it turned out to be really good with only an extra day's work.
Someone just pointed out Kurt Pelzer's article in Game Programming Gems 4 called "Combined Depth and ID-Based Shadow Buffers". I don't have the book (which is stupid of me), so I can't check, but there's really no magic to the algorithm except the realisation that you //can// combine the two. After that it's all fairly obvious, so we probably have much the same technique. So I just reinvented somebody else's wheel, which is stupid of me - sorry about that Kurt.
Renderstate changes cost valuable clock cycles on both the CPU and GPU. So it's a good idea to sort your rendering order by least-cost. But you need to be careful you know what "least-cost" means - it's not just the number of ~SetRenderState calls you send down!\n\n(note - I'm going to suggest some numbers in this section - all numbers are purely hypothetical - they are realistic, but I have deliberately avoided making them the same as any hardware I know about - they're just for illustration)\n\nTypically, graphics hardware is driven by a bunch of internal bits that control the fixed-function units and the flow of the overall pipeline. The mapping between those bits and the renderstates exposed by the graphics API is far from obvious. For example, the various alpha-blend states look simple in the API, but they have large and wide-ranging implications for the hardware pipeline - depending on what you set, various types of early, hierarchical and late Z will be enabled or disabled. In general, it is difficult to predict how some render states will affect the pipeline - I have written graphics drivers, I have an excellent knowledge of graphics hardware, and I'm responsible for some right now, and I //still// have a hard time predicting what a change in a single renderstate will do to the underlying hardware.\n\nTo add to the complexity, a lot of modern hardware can have a certain limited number of rendering pipeline states (sometimes called "contexts", in a slightly confusing manner) in flight. For further complexity, this number changes at different points in the pipeline. For example, some hardware may be able to have 2 pixel shaders, 2 vertex shaders, 4 sets of vertex shader constants in flight at once, and 8 different sets of textures (again, numbers purely for illustration). If the driver submits one more than this number at the same time, the pipeline will stall. Note the differing numbers for each category!\n\nIn general, there are some pipeline changes which are cheaper than others. These are "value" changes rather than "functional" changes. For example, changing the alpha-blend state is, as mentioned, a functional change. It enables or disables different things - the ordering of operations in the pipeline changes. This can be expensive. But changing e.g. the fog colour is not a functional change, it's a value change. Fog is still on (or off), but there's a register that changes its value from red to blue - that's not a functional change, so it is usually fairly cheap. Be aware that there are some that look like "value" changes that can be both. For example, Z-bias looks like just a single value that changes from - should be cheap, right? But if you change that value to 0, then the Z-bias circuitry can be switched off entirely, and maybe the pipeline can be reconfigured to go faster. So although this looks like a value change, it can be a functional change as well.\n\nWith that in mind, here's a very general guide to state change costs for the hardware, ordered least to most:\n\n* Changing vertex, index or constant buffers (~DX10) for another one ''of the same format''. Here, you are simply changing the pointer to where the data starts. Because the description of the contents hasn't changed, it's usually not a large pipeline change.\n\n* Changing a texture for another texture identical in every way (format, size, number of mipmaps, etc) except the data held. Again, you're just changing the pointer to the start of the data, not changing anything in the pipeline. Note that the "same format" requirement is important - in some hardware, the shader hardware has to do float32 filtering, whereas fixed-function (i.e. fast) hardware can do float16 and smaller. Obviously changing from one to the other requires new shaders to be uploaded!\n\n* Changing constants, e.g. shader constants, fog color, transform matrices, etc. Everything that is a *value* rather than an enum or bool that actually changes the pipeline. Note that in ~DX10 hardware, the shader constants are cheap to change. But in ~DX9 hardware, they can be quite expensive. A "phase change" happened there, with shader constants being moved out of the core and into general memory.\n\n* Everything else - shaders, Z states, blend states, etc. These tend to cause widescale disruption to the rendering pipeline - units get enabled, disabled, fast paths turned on and off. Big changes. I'm not aware of that much cost difference between all these changes.\n\n(note that changing render target is even more expensive than all of the above - it's not really a "state change" - it's a major disruption to the pipeline - first and foremost, order by render target)\n\nOK, so that's the ''hardware'' costs. what about the ''software'' costs?\n\nWell, there's certainly the cost of actually making the renderstate change call. But that's usually pretty small - it's a function call and storing a DWORD in an array. Do not try to optimise for the least number of ~SetRenderState calls - you'll spend more CPU power doing the sorting than save by minimising the number of calls. Total false economy.\n\nAlso note that the actual work of state changes doesn't happen when you do the ~SetRenderState. It happens the next time you do a draw call, when the driver looks at the whole state vector and decides how the hardware has to change accordingly - this is caled "lazy state changes", and every driver does it these days. Think of it as a mini-compile stage - the driver looks at the API specification of the pipeline and "compiles" it to the hardware description. In some cases this is literally a compile - the shaders have to be changed in some way to accommodate the state changes. This mini-compile is obviously expensive, but the best drivers cache states from previous frames so that it doesn't do the full thinking every time. Of course, it still has to actually send the new pipeline to the card, and stall when the card has too many contexts in flight (see above). So what tends to happen is that on the CPU side, there's states that cause a new pipeline (tend to be higher on the list above), and states that don't (tend to be lower). But there's a much lower difference in cost between top and bottom.\n\n(note that although the SuperSecretProject is a software renderer, you still have a certain cost to state changes, and those costs are reflected in the above data. It's not exactly a coincidence that a software renderer and hardware renderers have similar profiles - they have to do the same sort of things. Our consolation is that we don't have any hard limits, so in general the cost of changing renderstates is much lower than hardware).\n\nOf course it's also a good idea to sort front-to-back (for opaque objects), so that has to be reconciled with sorting by state. The way I do it is to sort renderstates hierarchically - so there's four levels according to the above four items. Generally I assign a byte of hash per list item, and I concatenate the bytes to form a uint32 (it's a lucky accident there's four items - I didn't plan it that way). Also note this hash is computed at mesh creation time - I don't go around making hashes at runtime.\n\nSo now I have a bunch of hashed renderstates, what order do I put them in? What I generally do is assume the first two types of state change - changing constant buffers and texture pointers - are much cheaper than the others (also, you don't tend to change shader without changing shader constants and textures anyway, so it's a reasonable approximation). So I make some "sub-buckets" of the objects that share the same state in the last two items. In each sub-bucket I pick the closest object to the camera. Then I sort the sub-buckets according to that distance (closest first), and draw them in that order. Within each bucket, I draw all the objects that have each renderstate in whatever order they happen to be - all I care about is getting all the objects for one state together. This gets me a fair bit of early-Z rejection without spamming the hardware with too many expensive changes.\n\nIn theory by sorting within each sub-bucket you could get slightly better driver/hardware efficiency, but I suspect the effort of sorting will be more expensive than the savings you get. I've not measured this, so this is just my experience talking. By all means if you have the time, test these hunches and let me know if my intuition is wide of the mark. It wouldn't be the first time.\n\nNote that obviously I do the "first nearest object" as I'm putting the objects into the sub-bins, not as a later sorting stage, and there's a bunch of other obvious implementation efficiencies. And I use a bucket-sort for Z rather than trying for high precision. So although it sounds complex (it's surprisingly difficult to explain in text!), the implementation is actually fairly simple and the runtime cost is very low.
The range of names is mind-boggling.\nSSE\n~SSE2\n~SSE3\n~SSSE3\n~SSE4\n~SSE4.1 (maybe it's like ~SSE4 with an extra low-frequency channel?)\n~SSE4.2\n...and now ~SSE5\n\nI'll be so glad when this instruction set is replaced by something more sensible. Although ~SSE5 does finally add ternary instructions and multiply-add.
I finally finished a fairly tedious but necessary part of [[Dwarf Minecart Tycoon]] - save and load. There's two parts to this - how, and why.\n\n''Why?''\n\nWell duh - so I can load and save games! No - it's not that simple of course. The real question is why do it now, when there's barely any game there? Why not wait until the end of the project when everything's more stable? The answer is that the ability to snapshot and restore the game gives you a bunch of really useful abilities that will make life simpler while developing the game. Of course you have the extra hassle of keeping the save/load system up to date as you change it, but I still think these are worth the effort.\n\n1. Easier testing. If I need to test the pathfinding abilities of a dwarf through a complex sequence of bridges and doors, I need that complexity in a map. If I have to create that map from scratch each time, I'm going to skimp on it and just hope the algo works. And as we all know, if you didn't test it, it doesn't work :-) Whereas with save/load I can just create the map once and reload each time as I'm testing or debugging.\n\n2. Stress-testing. Once I have the dwarves doing good pathfinding in a variety of complex environments, I can periodically check that they still work as expected in these environments, and that a change I made recently to fix one bug didn't just break lots of other things. Indeed, I can do a complete rewrite of the system if I need to, but if it still behaves well in the fifteen different environments I have as savegames, then I can be reasonably sure I didn't miss any major features during the rewrite. Taken to the extreme, this is "test-driven development" - the idea that the only criterion for "good code" is "does it pass the tests". And if it passes the tests but still doesn't work in some other situation well then - you didn't have enough tests :-) I'm not quite that radical, but it's an interesting principle to bear in mind, and the principle does scale well to large teams and complex codebases.\n\n3. Easy debugging. This is the big one for me. For complex games like this, bugs can hide and only be apparent many minutes after they actually happened. For example, the bug will manifest as two different dwarves walking to mine out the same section of rock, when in theory you should only allow one to do it. The bug happened when they both decided to walk over there (there should be some sort of marker or arbitration to make sure the second one gets on with some other task), but it only actually manifests when the second one gets there and finds no rock to dig! Now you're sitting in the debugger, you know what happened, but the actual bug happened tens of seconds and hundreds of game turns earlier, so good luck doing the forensics on that.\n\nSo the way we solved similar problems on [[StarTopia]] will work fine here as well. Every 15 seconds or so, you auto-save the game. Use a rotation of filenames so you have the last 10 or so saves around. Then, if you get a gnarly bug like this, you can always find an earlier savegame before the bug happens, then watch the respective dwarves as they make the incorrect decisions. The other key component to this is of course repeatability - if you load an earlier savegame and it doesn't happen because the dwarf rolled his RNG differently and goes to a different bit of rock, you'll never find the bug. So it's important to make sure the game is deterministic and repeatable. I've got another blog post on the boil about [[Deterministic gameplay]]. The only thing that isn't deterministic right now is the player - although the dwarves will follow existing queued commands and directives, I'm not recording and playing back the new commands that I issue. That's not too difficult, and I might do it in the future, but because in this style of game 99% of things (and therefore bugs) happen autonomously, it should still catch most of the bugs.\n\n''How?''\n\nThe obvious way to do save and load is to write a routine that traverses through your whole world writing every object into some data structure on disk. Doing this yourself is a pain in the arse. You need to figure out what to turn pointers into - in the file they need to be an object index, or an offset from the start of the file or something like that. You also need to at the very least version-number everything, because you'll almost certainly want to change your data structures as you develop the game, but still keep the ability to load older versions of save files. Keeping that code to load old files around is really tedious and it gets grubby pretty fast.\n\nBut I have a cunning plan! Since I used to be Mr Granny3D, I happen to know that Granny has a bunch of routines for doing 90% of the boring work, and it does it really well. It's so good that Granny can still load meshes, anim, etc files created from the very first release of Granny version 2, back in 2002. It does versioning by matching field names. If fields move that's no problem. If new fields are added it initializes them to defaults (zero or NULL), and if fields vanish it ignores them. It does all the pointer fixup for you - you can save directly from in-memory structures, and structures it loads are all ready to use. It handles 32<->64 bit transitions, and also endian changes. The good thing is the routines are designed to work for non-Granny data formats - you can define your own and use them for any data you like. Yeah, I know I sound like an advert, but it is very useful.\n\nAnyway, I added granny_data_type_definition thingies to all my structures, and now Granny reads and writes them. But it's not quite that simple - still a few things to do. Granny will store out the canonical definitions of the base types, e.g. I have a train type called "Fast Red Train" which has stats for power, acceleration, sprite use, etc. And all of the trains of that type reference it. Granny will happily save that structure out, but when I load it back in, I don't want to create a new base type - they're meant to be hard-coded into the game. So I need to go through the train types, see which ones are of that type and map them to whatever the current game has hard-coded as the "Fast Red Train", even if it has slightly different stats. Of course if that name doesn't exist any more I can choose to map it some other way - closest similar stats or somesuch.\n\nProbably the biggest hassle was the enums. I have a lot of enums in the game - peep types, object types, train types, order types, block types, etc etc etc. I'll certainly want to add to the list, and sometimes change the list, and maybe even rearrange the ordering. In the save file they're just stored as integers, so how do I deal with this? Well, I have a neat hack from [[Charles Bloom|http://cbloomrants.blogspot.com/]] and others to create both the enum and the list of enum strings at the same time. This is normally intended for printing the names out:\n\n{{{\n#define TrainState_XXYYTable \s\n XX(TS_Stuck) YY(=0), \s\n XX(TS_Going), \s\n XX(TS_Waiting), \s\n XX(TS_Stopped), \s\n XX(TS_LAST)\n\n#define XX(x) x\n#define YY(y) y\nenum TrainState\n{\n TrainState_XXYYTable\n};\n#undef XX\n#undef YY\n\n#define XX(x) #x\n#define YY(y)\nconst char *TrainState_Name[] =\n{\n TrainState_XXYYTable\n};\n#undef XX\n#undef YY\n}}}\n\n...and then you can do things like\n\n{{{\nprintf ( "Movement type %s", TrainState_Name[Train->Movement] );\n}}}\n\nBut you can also make part of the savegame state point to these arrays of strings and have Granny save them out as an array of strings. So now when you load the savegame structure in, Granny will have a table of the strings loaded. You can then compare the two tables with strcmp(). If they match, fine. If they don't you can construct an old->new mapping table and remap them. Obviously if an old enum is removed you need to write some code to deal with that, but that's not very common, and probably the cleanest thing to do is write the conversion code, convert all your old save games (by loading them and then saving them again) and then delete that code.\n\nThe biggest annoyance with Granny's loading routines is actually a positive feature of them in normal circumstances. When Granny loads a file, it does so using just one big allocation. That's a good thing for reducing fragmentation if the file is a mesh or an animation or something like that that is not usually manipulated by the game once loaded. The problem is in my case it's a world, and the parts of that world need to be mutable and separately allocated, so they can be separately freed and so on. I can't free memory in the middle of that big allocation - that'll cause nasty problems. The solution is I have a ~CopyWorld routine (and a ~CreateWorld and ~DestroyWorld). This does what it says on the tin - it copies the world. So the loading process is:\n\n* Load Granny world file.\n* Get Granny to do version converts if needed.\n* Check if the enum string arrays changed and convert the in-object enums if so.\n* Remap base types (e.g. point the loaded "Fast Red Train" to the global "Fast Red Train")\n* ~CopyWorld from the loaded Granny one to create a new one made of lots of allocations.\n* ~DestroyWorld on the currently running world.\n* Free the world in the single block of memory Granny loaded.\n* Switch to the world we just loaded.\n\nSaving is the opposite:\n\n* Copy the current world to a new one.\n* Clear any cached data (no point in saving that and bloating the file).\n* Point the "savegame" state at the enum string arays so they get saved.\n* Get Granny to write the file out.\n* ~DestroyWorld on the copied version.\n\nWait... why do I need to copy the world first? Two reasons, both kinda minor. First - I want to nuke the cached data so Granny doesn't save that, and do some pointer fixup that maybe the real game won't appreciate. Second - after the copy, I could fork the savegame stuff into a separate thread. So if the Granny conversions or the disk access do cause thread stalls, they won't have much on an effect on gameplay speed, and that's actually kinda cool - it means you can have autosaves all the time without a nasty stutter as it does so. So yeah, not that important, but once you have a ~CopyWorld routine, you just want to use the sucker everywhere. OK, I'll admit it - it's actually ~DestroyWorld I like calling the most. "You may fire when ready."\n\n''Are you sure this works?''\n\nOh yes. I'm well aware that one of the problems with normal save/load code is it doesn't get called much, so it doesn't get tested much. To fix this, I have a debug mode that every single game turn calls ~CopyWorld twice, calls ~StepGameTime() on the original and the first copy, compares the two to make sure they're the same, and then calls ~DestroyWorld on the two copies. Why make the second copy? Well, so if the comparison fails, I can step the second copy and watch what changes. Then if the copies agree, I screwed up the ~CopyWorld routine. And if all three disagree, then I probably have a dependency on a global somewhere (e.g. a RNG I forgot to move into the world, or dependency on traversal order or somesuch). It's not fast - the game runs at about quarter speed - but it does function well enough to do it every now and then for a few minutes.\n\nThen to test save/load I do the obvious thing - I save the first copy, destroy the first copy, step the original world, load the first copy, step it, and compare it. Now if there are mismatches, I screwed up the save/load routines. I can run it so this save/destroy/load/compare happens every frame. It's not exactly quick (though not as slow as you'd think - about a timestep every half second in debug mode - Windows file caching seems to work pretty well!), but I have left games running for days with my little dwarves and trains running around with queued orders building, delivering, etc. It seems pretty robust.\n\n''Conclusion''\n\nIt was a lot of rather tedious work, but actually not as horrible or hard to debug as it could have been. And using Granny's routines should make it moderately easy to maintain when I change the structures. And I'm glad I now have the easier debugging that autosaves and deterministic gameplay give you, because I've already had evil bugs like that and had to fix them through just thinking hard. I hate thinking hard, because you never really trust the result - maybe you fixed it, maybe you just changed the initial conditions enough to avoid it this time. I'd rather watch it actually happen in a debugger so I know exactly what to fix, then replay it and watch it not happen the second time.
A [[Scene Graph|http://en.wikipedia.org/wiki/Scene_graph]] is essentially a method of rendering where you place your entire world into this big graph or tree with all the information needed to render it, and then point a renderer at it. The renderer traverses the graph and draws the items. Often, the edges between the nodes attempt to describe some sort of minimal state changes. The idea is fairly simple - we're computer scientists, so we should use fancy data structures. A tree or graph is a really cool structure, and it's great for all sorts of things, so surely it's going to be good for rendering, right?\n\nIt's a great theory, but sadly it's completely wrong. The world is not a big tree - a coffee mug on a desk is not a child of the desk, or a sibling of any other mug, or a child of the house it's in or the parent of the coffee it contains or anything - it's just a mug. It sits there, bolted to nothing. Putting the entire world into a big tree is not an obvious thing to do as far as the real world is concerned.\n\nNow, I'm not going to be a total luddite and claim that trees are not useful in graphics - of course they are. There's loads of tree and graph structures all over the place - BSP trees, portal graphs, skeletal animation trees, spatial octrees, partitioning of meshes into groups that will fit in memory (bone-related or otherwise), groups of meshes that are lit by particular lights, lists of objects that use the same shader/texture/post-processing effect/rendertarget/shadowbuffer/etc. Tons of hierarchical and node/edge structures all over the place.\n\nAnd that's the point - there's tons of them, they all mean different things, and (and this is the important bit) they don't match each other. If you try to put them all into one big graph, you will get a complete bloody mess. The list of meshes that use shadowbuffer X are completely unrelated to the list of meshes that use skeleton Y. I've seen many Scene Graphs over the years, and typically they have a bunch of elegant accessors and traversers, and then they have a gigantic list of unholy hacks and loopholes and workarounds and suchlike that people have had to add to get them to do even the simplest demos. Try and put them in a big, real, complex game and you have a graph with everything linked to everything. Either that, or you have a root node with fifty bazillion direct children in a big flat array and a bunch of highly specialised links between those children. Those links are the other structures that I mentioned above - but now they're hard to access and traverse and they clutter up everything.\n\nFundamentally of course you do have to resolve a traversal order somehow - the objects need rendering, and there's some sort of mostly-optimal way to do it. But you're just never going to get that ordering by just traversing a single tree/graph according to some rules. It's fundamentally more complex than that and involves far more tradeoffs. Do you traverse front to back to get early-Z rejection? Do you traverse by shadowbuffer to save switching rendertargets? Do you traverse by surface type to save switching shaders? Do you traverse by skeleton to keep the animation caches nice and warm? All of this changes according to situation and game type, and that's what they pay us graphics coders the big bucks for - to make those judgement calls. You can't escape these problems - they're fundamental.\n\nBut trying to fit them all into one uber-graph seems like madness. This is compounded by the fact that most Scene Graphs are implicitly a "retained mode" paradigm, and not an "immediate mode" paradigm. My esteemed friend [[Casey Muratori|https://mollyrocket.com/forums/]] has some uncomplimentary comments on doing things with retained mode concepts, and I'm very inclined to agree. Yes, they can work, and sometimes they're necessary (e.g. physics engines), but it's not what I'd choose given an alternative.\n\nThe one major argument I hear presented for Scene Graphs is the "minimal state change" concept. The idea is that the edges between your nodes (meshes to be rendered) hold some sort of indication of the number or cost of state changes going from one node to the other, and by organising your traversal well, you achieve some sort of near-minimal number of state changes. The problem with this is it is almost completely bogus reasoning. There's three reasons for this:\n\n1. Typically, a graphics-card driver will try to take the entire state of the rendering pipeline and optimise it like crazy in a sort of "compilation" step. In the same way that changing a single line of C can produce radically different code, you might think you're "just" changing the AlphaTestEnable flag, but actually that changes a huge chunk of the pipeline. [[Oh but sir, it is only a wafer-thin renderstate...|http://en.wikipedia.org/wiki/Mr._Creosote]] In practice, it's extremely hard to predict anything about the relative costs of various changes beyond extremely broad generalities - and even those change fairly substantially from generation to generation.\n\n2. Because of this, the number of state changes you make between rendering calls is not all that relevant any more. This used to be true in the DX7 and DX8 eras, but it's far less so in these days of DX9, and it will be basically irrelevant on DX10. The card treats each unique set of states as an indivisible unit, and will often upload the entire pipeline state. There are very few //incremental// state changes any more - the main exceptions are rendertarget and some odd non-obvious ones like Z-compare modes.\n\n3. On a platform like the PC, you often have no idea what sort of card the user is running on. Even if you ID'd the card, there's ten or twenty possible graphics card architectures, and each has a sucession of different drivers. Which one do you optimise for? Do you try to make the edge-traversal function change according to the card installed? That sounds expensive. Remembering that most games are limited by the CPU, not the GPU, and you've just added to the asymmetry of that load.\n\nAnyway, this is something we can argue til the cows come home. I just wanted to give my tuppen'orth against the "prevailing wisdom" of using a Scene Graph. I'm not saying there aren't some good reasons for using one, but after writing a fair number of engines for a fair number of games, I have yet to find any of those reasons particularly compelling. They invented a lot of cool stuff in the 70s and 80s, and there's plenty of it that we continue to [[ignore at our peril|Premultiplied alpha]]. But some of it was purely an artifact of its time, and should be thwacked over the head and dumped in a shallow grave with a stake through its heart.
Wolfgang Engel and Eric Haines have managed to get the ~ShaderX2 books made [[free and downloadable as PDFs|http://www.realtimerendering.com/blog/shaderx2-books-available-for-free-download/]]. I wrote three articles that were included in ~ShaderX2 Tips & Tricks, and it's nice to have them "live" and on the net. I'd also like to convert them to HTML (as I have done some previous articles) and put them on my website - much easier to search than ~PDFs - but honestly I'm so busy with all things [[Larrabee]] that it may be some time before I work up the energy. But a quick self-review is possibly in order:\n\n''Displacement mapping''\n\nOh dear - a lot of talking, and almost no content! It's a "Tom talks for a while about the state of the industry" article. Unfortunately, the industry still hasn't figured out how to do displacement mapping, which is immensely frustrating. We're still stuck in the chicken-and-egg of having to solve three difficult problems before we get any progress.\n\nIn my partial defence, all the content is in the very first article in ~ShaderX2 - "Using Vertex Shaders for Geometry Compression" by Dean Calver. We realised early on that our articles would overlap, and since Dean had shader code ready to go, he talked about that while I talked about tools. You can think of them as two parts of a larger article.\n\nAside from moaning about why hardware doesn't do displacement mapping (and it still doesn't!), I think the main goodness from the articles is [[Displacement Compression]], which I think is still an interesting technique. There's some more details on the algorithm in my [[GDC2003 paper|http://www.eelpi.gotdns.org/papers/papers.html]], and Dean shows some good shader code. Sadly, [[MuckyFoot]] dissolved before we were able to battle-test the pipeline described in the article, so we'll never know how elegantly it would have worked in practice. I always wanted to bolt the system into [[Granny3D]]'s export pipeline, but it just wasn't a natural fit. I still think it's an excellent way to render high-polygon images without waiting for [[Larrabee]] to change the world.\n\n''Shader Abstraction''\n\nIt's always difficult to talk about engine design in articles without gigantic reams of code. I think this is one of my more successful efforts, but it's always hard to be sure. A lot of these concepts were expanded upon by others in the later ~ShaderX books (where I was sub-editor of the "3D Engine Design" section).\n\nProbably the one part of this article that I haven't seen discussed elsewhere is the texture compositing system. That was used in a production pipeline (a prototype in [[Blade II]], and more extensively in the following project), and it worked well. The biggest benefit was that the artists could use any file and directory structure they liked, and they could copy and rename textures at will, store them in any format they liked, and so on. This made life much simpler for them, as they didn't need to constantly worry about bloating disk or memory space. When it was time to bake the assets together on the disk, all that duplication was removed by hashing, and the fiddly stuff like format conversion and packing specular maps into alpha channels all happened automatically. Discussing this with other game devs I know others have invented similar systems, I've just never seen anybody write about them.\n\n''Post Process Fun with Effects Buffers''\n\nThis was the result of me trying to figure out how to structure an engine that did more than just the usual opaque and alpha-blended stuff. It adds a lot of structure on top of a standard pipeline. In one sense I'm not that happy with it - it turned out far more complex than I originally hoped, with lots of structures, dynamic allocation of rendertargets, and object callbacks. It feels like overkill for a couple of cheesy post-processing effects. On the other hand, once you have that complexity, it does make some things pleasingly elegant. What is still unproven is whether that extra complexity plays well with a full-blown game rendering engine rather than a simple demo. I still think there's some interesting ideas here.
GDC2007 was a lot of fun, though tiring. On the first day, I did two talks.\n\nThe first was about shadowbuffers. Yes, I know, again. What can I say - a GDC wouldn't be complete without me giving yet another talk about shadowbuffers - can't fight tradition. Anyway, for this one I figured out how to combine ID and depth shadowbuffers to get the best of both worlds - no surface acne, and no Peter Pan problems. Additionally, this is robust - it doesn't fall over according to the size of your world or the distance to the light or what objects are in the scene, which is a constant problem with depth-only shadowbuffers. As a nice bonus, it only needs 8 bits of ID and 8 bits of depth, so not only do you only need a 16-bit-per-texel shadowbuffer, but it works on pretty much every hardware out there. Certainly everything with shaders or register combiners (including Xbox1 and GameCube), and you could probably even get it to work on a PS2 if you tried hard enough. I think this problem can now be marked SOLVED, and combined with Multi Frustum Partitioning (which is still more expensive than I'd like, but does work), I think hard-edged shadows are also a SOLVED problem. Now we just have make the edges nice and soft - plenty of methods in the running, but still no clear winner. Lots of people researching it though, so I think it shouldn't be too long.\n\nThe other talk was only half an hour, and it was a brief look at ways to integrate lots of complex shaders into our pipelines. Most people are moving from just a few simple shaders in previous gen hardware up to the full combinatorial craziness of today's hardware. There's a lot of methods that work for 10s of shaders that just don't scale when you get to 100s. This talk explores the methods I've used in the past when throwing large numbers of shaders at multiple platforms. It's certainly not meant as gospel - there's many ways to skin this cat - but it should be a decent start for those bewildered by the options.\n\nBoth these talks are on my [[papers|http://www.eelpi.gotdns.org/papers/papers.html]] page. The shadowbuffer one also has a demo with it - let me know if you have trouble running it, it hasn't had much testing on other machines.\n\nIt was good to get those talks out of the way early on. After that I could focus on a day or two of stuff for the SuperSecretProject, and then relax and be a RadGameTools booth-babe for the rest of the show. And now I need some sleep.
Kinda silly, but the question keeps coming up - which is it? The answer is - both or either, add a space as desired. They're all the same thing, just depends who you ask. I prefer "shadowbuffer", because the word "buffer" implies a certain dynamic quality. Framebuffers and Z-buffers are single-frame entities, and dynamically updated by the hardware, just like shadowthingies. On the other hand vertex, index and constant buffers generally aren't, so it's not a universal quality. Shadowmaps are very much like texture maps - indeed, they're sampled with texture hardware, and there is a "mapping" between what is in them and the visible frame. However, I tend to think of "shadowmaps" as the reverse of "lightmaps" - they are calculated offline and not dynamically generated. Taken all together, I prefer "shadowbuffer".\n\nAs for the rest of the world - well, it's not my fault if they're wrong. But just for information, Google says:\n"shadowbuffer": 1,160 hits.\n"shadow buffer": 2,790,000 hits.\n"shadowmap": 20,900 hits.\n"shadow map": 32,600,000 hits.\n\nSo what do I know?
Phew! That was exhausting. My first Siggraph and it was right in the public eye. But I met a lot of very smart people, and got introduced to a lot of prominent academics - it's fun to be able to meet people after reading all their papers.\n\nThere was a serious amount of "Larrabuzz" around at Siggraph, and a lot of Intel people were being swamped by questions - most of which unfortunately we're not allowed to answer yet. If we were vague on some topics, we all apologize - it's tough remembering what information is not yet public. An added restriction was the length of talks - at Siggraph we had four talks on Larrabee totalling only 90 minutes, and there's only so much information you can pack into that time.\n\nThe main paper was "Larrabee: A ~Many-Core x86 Architecture for Visual Computing" (available from the ACM or your favourite search engine), and was presented by Larry Seiler on Tuesday. It outlined the hardware architecture and some of the reasoning behind the choices we made when designing it. It was extremely exciting to see this as an official Siggraph paper after surviving the whole rigorous process of peer-review and rebuttals. It's gratifying to know that the academic community finds this an interesting and novel architecture, and not just another graphics card.\n\nOn Thursday there were three talks on Larrabee in the "Beyond Programmable Shading" course. I talked about the Larrabee software renderer, Matt Pharr talked about how the architecture opened up some new rendering techniques, and Aaron Lefohn talked about the extension from there into general-purpose computing, including the term "braided parallelism" which I think is an awesome metaphor for the mix of thread and data parallelism that real world code requires. In that same course we had similar views from the AMD and Nvidia side of things, previews of the ~APIs that will be driving them from Microsoft and Apple, and also some fascinating glimpses into the future from Stanford, UC Davis, Dartmouth and id Software (cool demo, Jon!). All of us were under the gun on time limits and everyone had to drop content to fit, so well done to all of the speakers in the session - we managed to keep it on-schedule. [[All the notes are available for download here|http://s08.idav.ucdavis.edu/]] - my notes on the site are slightly longer and more detailed than the ones I spoke to (32 minutes instead of 20), so they're worth looking at even if you went to the talk.\n\nSupplemental: also see the [[Siggraph Asia 2008]] versions of the talks for a different view on things.
Just noticed that the Siggraph Asia 2008 course notes for the "Beyond Programmable Shading" course are [[available for download|http://sa08.idav.ucdavis.edu/]]. At first glance they're a similar structure to the [[Siggraph 2008]] ones, but Tim Foley has some new slants on the material that are pretty interesting. He's highlighted a slightly different take of things, and it might give people some more insights into the nature of The Bee.
It's only pretending to be a wiki.
TomF's Tech Blog
http://eelpi.gotdns.org/blog.wiki.html
There's a discussion I keep having with people, and it goes something like this:\n\n"Hey Bob - how's it going at ~UberGameSoft?"\n"Hey Tom - it's going well. I'm figuring out a neat HDR system, Jim's doing a cool physics engine, and Frank's got some awesome post-processing effects."\n"Cool. So you guys must be well into production by now, right?"\n"Kinda, but not really. We still have trouble with the whole export pipeline and there's a lot of pain getting assets into the game."\n\nAt this point, I usually start yelling at people and preparing another ranty blog entry. And thus...\n\nThere's a bunch of stuff that is neat, and it's geekily cool, and we as engineers all love to play with them and they generate some killer GDC talks and suchlike. But they're all unimportant next to the stuff that is necessary to ''ship the game''. We're now looking at team sizes of 50-150 people for 2-4 years, and that's a lot of assets.\n\nYou need the smartest coders in the building to be making sure that everyone else can work efficiently. And that means they're tools coders. Yes, "tools" - don't look at me like I said a dirty word.\n\nThe problem is that at heart a lot of us are still bedroom hackers. And that's a great mentality to have - it keeps us honest. But at those numbers, you have to have some of the dreaded word - "management". But we're smarter than the average bear, so we can use geek management, not [[PHB|http://en.wikipedia.org/wiki/Pointy_Haired_Boss]] management. And it's not management of people so much as management of data. Good people who like the work they do and the people they work with will generally manage themselves pretty well. The problems come when the data flowing between them sets up dependencies. Now obviously that's inevitable - but people can cope with obvious dependencies - like you can't animate a character very well until the rigging is done or whatever.\n\nThe problem is dependencies that aren't as obvious. They're obvious to us coders, because we're the ones who write the systems. And unfortunately the easiest way to write most systems is to have all the source data available at once, throw it in a big post-processing hopper and out comes the DVD image, and then we ship it. Fairly obviously, this is a disaster in practice. So instead we need to think carefully about the chain of dependencies and how we can isolate one person's changes from another, and allow people to work on related objects at the same time without getting stalled.\n\nThis is almost exactly the other big problem coders are facing - parallel processing. And we all know that parallel processing is really really difficult, so we put our smartest coders on the problem. And the same should happen with the tools - it's an even bigger parallel processing challenge, and the problem is not as well-defined as "make the physics go faster" because half the time when you start a game you don't actually know where you're going.\n\nSo I'm going to do some more posts on some of the detailed aspects once I get them straight in my head. But the big message is really that the old days of the hero-coder saving the day with leet gfx skillz are over. Shaders just aren't that difficult to write, HDR is a matter of detailed book-keeping, and shadows may be a pain in the arse but they're not fundamentally going to make or break your game. Good asset management on the other hand can mean the difference between shipping and going bankrupt. That's what the new breed of hero-coder needs to focus on.
Like... in a proper academic paper and everything! This was two years ago, but I only just found it. They took sliding window VIPM and fixed one of the major problems - the poor vertex-cache efficiency. They also tried to improve the memory footprint, but weren't as successful with that. But anyway - real academics writing real papers based on my stuff. Totally cool! I've had citations before in related papers, but never anyone deriving stuff directly and primarily from my work.\n\n[[Topology-driven Progressive Mesh Construction for Hardware-Accelerated Rendering|http://www.graphicon.ru/2004/Proceedings/Technical/2%5B4%5D.pdf]] - Pavlo Turchyn, Sergey Korotov\n\nYeah, OK, so they're from an improbably-named [[Finnish university|http://www.jyu.fi/]] and their paper was published in a Russian journal and it's not even listed on Citeseer. Whataddya want - a personal recommendation from Stephen Hawking? Jeez you people...
I've been writing Dwarf Minecart Tycoon for ages now, but it's always been a tiny world - 20x20x10 - which I could just fit in a a simple array. But the time has come to make it a biiiiig world. I've been careful to use wrappers at all times so I can just swap out the representation and 90% of the game doesn't care, and I've also used integer and fixed-point coordinates so there's no precision problems near the origin (see [[A matter of precision]]).\n\nBut what representation to use? The obvious thought is an octree with each node having eight pointers to child nodes, and some of the pointers can be NULL, so you can store huge sparse world. But to do random lookups is a LOT of pointer-chasing, and computers suck at those more and more as time goes on. The other problem with standard octrees is you tend to fill a lot of memory with pointers rather than actual data. One remedy is to have more than 8 children, so each node is larger than 2x2x2, for example 8x8x8. That's a third the number of traversals, and it's probably a wash in terms of memory usage.\n\nYou can also partition the world dynamically, e.g. k-D trees - which are like quad/octrees, but the split plane isn't always in the middle of the node, it's at an arbitrary point through the node. This lets you balance the tree, so you have the minimum number of walks to get to any node. On the other hand, I ''loathe'' writing tree-balancing code, and the point of doing this project is to have fun, so I'm going to arbitrarily discard these options. (if you don't dynamically rebalance, there's not much point and you might as well use a regular quad/octree).\n\nAnother way is to use a hash table instead of a structured tree. Take your world x,y,z coordinate, hash the bits of the coordinates together to make an index, look that up in a largish array, and there's a linked list hanging off that array of the possible nodes that have that hash (and if your hash is good, these nodes will not be anywhere near each other in the world). One really nice thing about this is that there's no "edge" to the world. For an octree, the top node needs to cover the whole world extents, so if you want a bigger world you have to add more layers of nodes, so more pointer-chasing. With a hash table, it all just works - it doesn't know or care how far the world is from one side to the other (assuming it fits in the int32 or int64 you use as x,y,z coordinates) - by definition the hash is modulo the size of the table, and then all it's doing is walking the list looking for the matching coordinate - the coordinates don't have to "mean" anything.\n\nBut one of the things I'd like to do is be able to run some parts of the map slower than others. Where the dwarves are running around, you want high-speed updates - things are falling, burning, water flowing, possibly air currents, etc. But away from there not much is happening - the water has settled to a level, plants are growing very slowly. So it would be good to put some large areas on a very low update rate. When doing cellular-automata stuff in the past, I found having a hierarchy of update speeds was really useful, so it would be good to use the hierarchy of something like an octree for this purpose as well. But a hash table on its own won't give you this, precisely because a good hash tries to avoid any spatial locality.\n\nFor really huge worlds, you need to start conserving memory - you can't store full detail for every occupied cube of the world. A trick to conserve memory for dense but dull areas (e.g. deep in the bowels of the earth where you haven't dug yet, or in the middle of the sea) is to take a volume and represent it in a compressed format. For example, you can use the octreetree but instead of going all the way down to the single node you stop at a higher level and store the whole thing as just one sort of block - "it's the sea". Or you can have your leaves be a decent size e.g. 8x8x8 and store that in an RLE fashion (bonus points for doing RLE with Hilbert or Morten walk patterns - though I'm not sure it actually makes much difference in this context). If the compressed format is too slow to access all the time, you could decompress to a more accessible format when creatures actually go down there, and then recompress when they leave. Hey - this ties nicely into the "low-frequency update" stuff above, doesn't it?\n\nOK, so which combination of the above? What are my criteria? Well, I'd like to have:\n\n1. A world as big as int32s will allow in all 3 directions.\n2. Reasonably memory-efficient, but given today's memory sizes being fast is probably more important - so not too much pointer-chasing.\n3. Some sort of hierarchy to allow region-sleeping.\n4. Option to region-compress (not planning to do this now, but maybe later).\n\nWhich to choose? Well, this isn't an interview question, you don't have to sit and cogitate it all yourself from first principles. The correct answer is - to the Twitternets! So I posted the question and sure enough got a bunch of great replies. I've tried to massage this into a vaguely readable format:\n\nAn interesting link: http://0fps.wordpress.com/2012/01/14/an-analysis-of-minecraft-like-engines/\n\nSebastian Sylvan (@ssylvan): hash. For tree, query pattern is crucial (Oct tree for uniform queries, kd-style for queries mainly near values). \nSebastian Sylvan (@ssylvan): hierarchical hash grid. Big contiguous chunks go in coarser grid. \n\nChris Pruett (@c_pruett): I guess I'm in the hash table camp provided you can come up with a good hash function. Oct tree + sparse data = empty nodes. \nChris Pruett (@c_pruett): Alternatively, I've seen kd trees used quite effectively for smaller datasets. Hash to nodes of kd trees? \n\nJon Watte (@jwatte): You answered your own question. For large solids, anything kd-like (even a BSP) will win. \n\nChris Pruett (@c_pruett): I'm still down for the layers-of-organizational-structure approach. Did a collision system that was a 2D grid of ~BSPs once. \nChris Pruett (@c_pruett): Theory being that resolution up close has different requirements than resolution of the world as a whole. \n\nPat Wilson (@pat_wilson): May not be applicable, but I saw this a few weeks ago: http://www.openvdb.org/index.html\n\nMark Wayland (@rompa69): Store smaller "octrees" in hashed chunks, only bits around the observer loaded? Octrees better for vis but hash for collision? \n* Per Vognsen (@pervognsen): IIRC, Minecraft uses a sparse array (hash table) of 16x16x64 chunks. Coldet is voxel tracing against fine grid. \n* Sebastian Sylvan (@ssylvan): also Oct tree != Oct trie. Uniform split is trie. An *actual* Oct tree reduces height while giving kd tree benefits. \n* Sebastian Sylvan (@ssylvan): i.e. tree is when you use input data points to pick split plane. Most people use "Oct tree" when they mean trie. \n* Sebastian Sylvan (@ssylvan): so real Oct tree: split node on a point that roughly subdivides input into 8 equal sized children. \n* Sebastian Sylvan (@ssylvan): true. Hierarchical hash trees are hard to beat for simplicity. R trees are cool to, my talk/post on them http://bit.ly/T7bLmg\n* Per Vognsen (@pervognsen): Sparse two-level arrays are a sweet spot for data structures. \n** mmalex (@mmalex): yes! Every time I stray into something more complex than this, I come back to 2 level stuff. Suits vm too \n** Per Vognsen (@pervognsen): I know Alex likes a B-tree-like root (sorted split keys) with dense subarrays. Lots of possibilities here. \n** Per Vognsen (@pervognsen): Right, and ideally each structure is flat, though it need not be (e.g. sparse voxel octrees). \n** Per Vognsen (@pervognsen): Not sure if you read Cyril Crassin's papers on those, but he uses an octree with leaf "bricks" (dense arrays).\n\nAlso:\nScott Nagy (@visageofscott): you're like two tweets from a "valve working on minecraft clone" story showing up on kotaku \n\nOK, yeah - no - [[Dwarf Minecart Tycoon]] is absolutely not a Valve project. I'm not even sure I ever want to release it - if the goal is to have fun writing it, releasing it would end the fun!\n\n\nAnyway taking into account all the above (I wasn't very clear on the ability to sleep large areas when I asked the question - 140-char limit!), the place I've started with is a "brick" at the lowest level, then a shallow octree above it, then a hash table. It's probably easiest to express with code. A ~MapNode is the container for a single hex of the map (holds a list of objects, etc), so going from the bottom up:\n\n{{{\nstruct WorldLeaf\n{\n static const int SizeXLog2 = 3;\n static const int SizeYLog2 = 3;\n static const int SizeZLog2 = 2;\n\n MapNode MapArray[1<<(SizeXLog2+SizeYLog2+SizeZLog2)];\n};\n\nstruct WorldOct\n{\n static const int MaxDepth = 4;\n static const int SizeXLog2 = 2;\n static const int SizeYLog2 = 2;\n static const int SizeZLog2 = 2;\n\n // Will be NULL iff not a leaf.\n WorldLeaf *pLeaf;\n // Some or all of these can be NULL if there's no child, or if this is a leaf.\n WorldOct *pChild[1<<(SizeXLog2+SizeYLog2+SizeZLog2)];\n};\n\nstruct WorldHashEntry\n{\n // Where the min corner (i.e. lest-most X,Y,Z) of this entry is in the world.\n IntMapPos MinCorner;\n WorldOct Oct;\n};\n\nstruct WorldHashList\n{\n ArbitraryList<WorldHashEntry*> Entries;\n};\n\nstruct World\n{\n ...other game stuff...\n\n static const int HashTableSizeLog2 = 8;\n WorldHashList HashList[1<<HashTableSizeLog2];\n};\n}}}\n\n\nI pulled all the statically-defined numbers numbers out of my arse - the size and depth of the octree, the size of the bottom leaves, etc. I'll probably do some tests later on to see what the optimal sizes and depths are, but for now these feel right - enough depth to the octree to allow efficient region-sleeping, but not too much pointer-chasing.\n\nI would have liked to make ~WorldOct something like:\n\n{{{\nstruct WorldOct\n{\n static const int MaxDepth = 3;\n static const int SizeXLog2 = 2;\n static const int SizeYLog2 = 2;\n static const int SizeZLog2 = 2;\n\n union\n {\n WorldLeaf *pLeaf[1<<(SizeXLog2+SizeYLog2+SizeZLog2)]; // If depth=2\n WorldOct *pChild[1<<(SizeXLog2+SizeYLog2+SizeZLog2)]; // If depth<2\n };\n};\n}}}\n\n...to save one extra pointer indirection per traversal, but it got really messy. I might try to fix that later.\n\nThe other thing I did was have a tiny cache of the most recently-accessed *~WorldLeaf entries, and before doing the whole tree-traversal thing, you check in that cache first. This provided a very nice speedup. How big should that cache be? Well the surprise was... 4 entries. I tried 2 and 8 and they were slower. I also tried LRU and random replacement policies, and LRU was best. Again, this needs more testing later, but it was surprising how small a good size was.\n\nI also wrote an iterator object that knows about the above structure, so when you want to query a region in an arbitrary order (e.g. for physics updates, etc) you use that iterator and it does sensible coherent traversals rather than doing top-down queries every single time. The other thing to write might be macros that do "find me the block to the negative Y" and so on, since they are also very common operations and will very frequently hit in the same ~WorldLeaf and never need to traverse any levels of the tree (note that as currently written, if you have a pointer to a ~MapNode and you know its world position, with some modulo math you can find out the pointer to its parent ~WorldLeaf, which is handy). But that's possibly for a future post, when more of the game, and hence the actual access patterns, are implemented. Until I have those access patterns, there's little point in "optimising" any further - I have no data.
I thought I'd better write some notes on my [[GDC 2003 talk|http://www.eelpi.gotdns.org/papers/papers.html]] on SH in games, as in hindsight there were some errors and some items cut for time.\n\nError - slide 5 - "To multiply two ~SHs, multiply their weights". Not true. Although you can add two ~SHs by adding their components, you can't do a modulation of one by the other that way. Doing a componentwise multiplication is a convolution, which is a completely different operation that I really don't fully understand.\n\nSlides 12 and 13 didn't tell you what the magical fConstX values were! Sorry about that. The correct values are found by taking the canonical SH coefficients (for example http://www.sjbrown.co.uk/?article=sharmonics), and then convolve with the clamped cosine kernel (i.e. to actually do the lighting with the standard clamp(N.L) equation we know and love). This is easy - multiply the first value by 1, the next three by 2/3 and the last five by 1/4. Finally, you'll probably want to convert the output from radiance (in some arbitrary units) to a graphics-friendly [0,1] range by scaling everything.\n\nAnd the results are:\nfConst1 = 4/17\nfConst2 = 8/17\nfConst3 = 15/17\nfConst4 = 5/68\nfConst5 = 15/68\n\nEasy! Also note that of course slide 13 - the vertex shader code - can be refactored to change (3*z*z-1) into just z*z by massaging the values held in sharm[7] and sharm[0] back in slide 12. That's left as a fairly simple exercise for the coder - I didn't want to confuse the slides with non-canonical equations.\n\nThanks to [[Peter-Pike Sloan|http://research.microsoft.com/~ppsloan/]] for getting this all straight in my head. He actually //understands// the fundamental maths behind this stuff, whereas I'm more of a ... er ... general concepts man myself (in other words, I suck).\n\nVolker Schoenefeld wrote a good paper on this - see [[More on SH]].\n\nRobin Green also wrote a good talk called "Spherical Harmonic Lighting: The Gritty Details" [[http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf|http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf]]\n\nAnd to complete the little circle of SH incest, a talk by Chris Oat of ATI/AMD at GDC04: [[http://ati.amd.com/developer/gdc/Oat-GDC04-SphericalHarmonicLighting.pdf|http://ati.amd.com/developer/gdc/Oat-GDC04-SphericalHarmonicLighting.pdf]].\n\n\n\nWhat I also had to miss out from the slides were some details of the implementation that I used in the Xbox1, ~PS2 and Gamecube versions of a game that never got finished before [[MuckyFoot]] vanished. This is actually the coolest bit about using SH illumination in real games. So the situation is that we wanted to give the artists a lot of control over their lighting environment. The problem is, what that usually means is they want a lot of lights. That gets expensive - the practical limit on Xbox1 and ~PS2 lights is about three directional lights with specular (on the Xbox you'll get diffuse bumpmapping on one or two of those with a pixel shader), and the Gamecube only has vanilla ~OpenGL lights (which is a lot worse than it sounds - they're not very useful in practice), and they're not fast.\n\nObviously the bar has been raised since - the 360 and ~PS3 can do really quite a lot of vertex and pixel shading, and although the Wii is still the same old Gamecube hardware, it is a bit faster. However, when I say the artists wanted a lot of lights, I don't mean 5, I mean 50. Most of them are fill and mood lights and don't need anything but per-vertex diffuse shading (i.e. not bumpmapping or specular shading), but that's still a lot of lights to ask the vertex shaders to do.\n\nThe solution we came up with was elegant. First, bake all the "mood" lights into the SH. Then of the "focus" lights (that need bumpmapping and specular), pick the brightest four shining on the object. Bake the rest into the SH. Take the brightest four, order them by brightness (1st = brightest, 4th = least bright). The fourth always gets baked into the SH as well. The third one is split into two parts - a real light and an SH fraction - and the brightness of these two always adds up the brightness of the real 3rd light. You split it up by using the relative brightnesses of the 2nd, 3rd and 4th lights so that when the 3rd and 4th lights are almost the same brightness, the 3rd light is all SH. And when the 2nd and 3rd lights are almost the same brightness, the 3rd light is all real, with no SH component. This way, as the object moves around, lights will move from being real lights to being baked into the SH nice and smoothly and you won't see any popping as they change types.\n\nSo that gives you two lights and some fraction of a third that are done properly - bumpmapping, specular, point light (instead of a directional approximation), etc. You can of course use four or five or even two. I found that two looked not that great, and four or five didn't improve matters significantly - three (well, two-and-a-bit) seems to be the the visually good number, but you should experiment yourself.\n\nThat's what we were going to do originally, and that remained the plan for the Xbox1. However, the ~PS2 didn't seem able to vertex-shade even three point lights with proper specular, so that was cut back to just one! Those paying attention will realise that actually that's some-fraction-of-one light - yes, if you were standing in between two equally bright lights, there was no proper lighting performed at all! Still, that's what needed to happen for speed, and it didn't actually look objectionable, just a bit washed-out at times compared to the Xbox1.\n\nThe Gamecube was a challenge - it has no vertex shading hardware at all - so what to do? We tried just feeding the brightest four lights into the standard ~OpenGL lights, and then feeding the zeroth element of the SH into the ambient term (which is something people have been doing for ages - way before SH came along). But it looked rubbish. The artists would put ten yellow lights spread out on the ceiling of a corridor and ten blue lights providing underlighting, and instead of doing the right stark cinematic yellow-opposite-blue lighting, all that would happen is you'd get a random assortment of a soft blue and/or yellow tinge in a random direction, and a muddy grey ambient. Looked awful. So instead the chap doing the GC version (Andy Firth) suggested we take the first four coefficients of the SH (rather than the first nine) and emulate them with OGL lights. So the zero term was still the ambient, the brightest real light was a real OGL light, and then the six cardinal directions (positive and negative x, y, z) each got a directional OGL light shining along that axis. You need six because the OGL lights get clamped on the backside of the light, so if your first coefficient is -0.5, then you need a -0.5 light shining along from the +X axis and a +0.5 light shining along from the -X axis. This was a super neat hack, and it brought the GC lighting quality nearly up to the ~PS2's.\n\nAlthough we weren't exactly //happy// with only having one light and a 4-component SH, the important thing was that the artists now had a reliable lighting model - they could put a bunch of lights in the scene and you'd something pretty close on all three platforms. They accepted that Xbox1>~PS2>GC in terms of quality, and it was //predictable//. Whereas the previous pre-SH lighting schemes would pick a random assortment of lights out of the ones they'd placed, which was very frustrating for them, and heavily limited the sort of "mood" they could set with lights.\n\n\nBut that wasn't the really cool bit. Oh no. The //really// cool bit is that you don't have to just project a directional light into the SH. You're doing this once per object - you've got a few CPU cycles to play with. So why not have some more flexible light types? The simplest tweak, which actually worked really nicely, was to give lights a volume. Normally, lights are done as point lights - infinitely small. Of course you typically set up some sort of falloff with distance to control it a bit better. I loathe the canonical OGL/DX light attenuation - it's really hard for artists to control well. Far better is a linear falloff that goes from a minimum distance (at which the light has maximum brightness, whatever the artist sets that to be) out to a maximum distance (at which the light is ZERO). The maximum distance is also nice for culling - you can actually throw the light away and not worry about it all. That's all pretty easy, and again people have been doing that for ages.\n\nThe problem is that sometimes you want a single bright light that illuminates a large area - let's say it has a brightness of the standard 1.0 at a distance of 10m. But now what happens when an object is 5m from the light? If you set the minimum distance at 10m, that object won't be any brighter, which looks odd. So move the minimum distance in and crank up the brightness. Well, your object is kinda brighter - but the lighting is still odd. Imagine the illuminated object is a sphere. So now a large part of the sphere (maybe a quarter of the surface) is saturating at full brightness, and then there's a falloff, and then still half the sphere is black. In a real room what you'd get from a bright light like that is backscatter illuminating the back side of the sphere, and indeed the shading of the sphere would still be smooth all over without any odd-looking saturation. The other effect is that really a light like that would often be physically larger than the sphere, so it wouldn't just be a directional light - the outer parts of the light would be illuminating around the edges of the sphere. And indeed as you get closer, almost the entire sphere would have some part of the light that would be able to see it directly.\n\nAnyway, so I gave the artists a physical radius in meters that they could set the light to, rather than being a single point in space. Then I do a bit of sensible-seeming maths involving the radius of the object being illuminated, the radius of the light illuminating it, and the distance of the light from the object. This gives me an approximation to how much "wrap around" I should give this light (a small object should get more wraparound than a large one). There's also an "ambient bounce" factor that the artist can set to simulate the light being in a small, white-walled room and causing backscatter as opposed to stuck in the middle of a big open space. But notice this ambient bounce still scales according to the distance between light and object.\n\nAnd then when I go to do the canonical directional-light SH evaluation as given in the slides, the ambient bounce factor just adds light into the sharm[0] ambient factor, and the wrap around factor does that and also shrinks sharm[4] through sharm[8] towards zero. This last one needs some explaining - basically what I'm doing is lerping from the real clamped cosine kernel (as given in the slides) into one that does (N.L*0.5+0.5), i.e. it has a bright spot on one side and a dark spot on the other, and a smooth blend between those, with no clamping needed. This is the fully wrapped around lighting state - when the object is right up against the light source. It should be obvious that a kernel like that is just sharm[0] = 0.5, sharm[1]-[3] = light_direction and sharm[4]-[8] = 0.0. So lerp between that and the standard directional coefficients using the wrap-around factor.\n\nThe neat bit is that this is also pretty easy to add to the per-pixel bumpmapping stuff (it's a modification of the idea of hemispherical lighting), so you can get wraparound lighting on that as well. You can also tweak the specular lighting so that your highlight size gets larger as you approach the light. This looks really nice - as objects get close to large, bright lights, they still seem to get brighter, even though the brightest-lit pixels are still maxed at full-white. It's pretty neat, and the artists loved having even more knobs to play with on their lights. Sadly, the game (and the company) was shut down before we got to the stage where they could really play with final assets and lighting, so we never found out what the system could do when given some full-time artist loving.
The first question in [[DMT]] was - what shape is the map? Squares are the obvious answer, but I've never liked them much. If you disallow diagonal movement and interaction, the code is simple, but things look ugly and distances are counter-intuitive. If you allow diagonal movement and interaction then the code's a pain - you now have to handle the diagonals, and you have to handle strange things like four people can meet and can all interact with (which in a game usually means "kill") each other, which is also counter-intuitive. And when you move to 3D you now have three cases of interactions - cube-face-to-cube-face, cube-edge-to-cube-edge, and cube-corner-to-cube-corner. It's a huge pain in the arse.\n\nHexes have always been the shape of choice for nerdcore games - wargamers swear by them for example - and they solve all of these problems. Movement looks better, and there's no strange corner cases. Another neat thing from rendering is if you want to render them as a heightfield (or indeed collide against them), they tessellate to triangles in obvious ways - whereas with squares you have to make a fairly arbitrary choice about which way to stick the diagonal. But it's not intuitive how to deal with them in code - we're so used to using a right-angled 2D coordinate system to refer to stuff. But I thought I'd bite the bullet and figure this stuff out right at the start. If it didn't work, well... go back to squares.\n\nThe system I decided on was to have three axes - X, Y, and W (Z is for height off the ground). X and Y are the two major ones, and that's how the storage is actually indexed - so I have a conventional 2D array in memory. X is where it normally is (aligned with Cartesian world X), and then Y is at 120 degrees to it rather than the usual 90. W is an implicit axis - it is midway between X and Y and the W coordinate value is equal to (X+Y)/2. Note that W is not an independent coordinate - you can't change it without changing either or both of X and Y. In practice W is never actually used as a coordinate or stored in any structure, it's really just a reference to a direction that you can move in. Of course as I write this I realise if I'd wanted proper symmetry, I would have pointed W the other way, so that X, Y and W were all at 120 degrees from each other. That would have been far more elegant, and (I now realise) analogous to barycentric coordinates in a triangle. Ah well, I don't think it makes much difference in practice.\n\nSo now movement and proximity is elegant again - things can only move or interact along the edges of their hexagons, so there's (literally) no corner cases any more. As long as you have a bunch of simple wrapper functions to say "given coordinate A, what is the new coordinate if I move in direction D", where there are six possible directions - positive X, Y, W and negative X, Y, W. In practice I numbered the directions as an enum going anticlockwise from positive X:\n\nenum ~IntMapHeading\n{\n ~MAPHEADING_PX = 0,\n ~MAPHEADING_PW,\n ~MAPHEADING_PY,\n ~MAPHEADING_NX,\n ~MAPHEADING_NW,\n ~MAPHEADING_NY,\n\n ~MAPHEADING_LAST /* Always last */\n};\n\nThe other big choice I made was to follow my own advice in the post [[A matter of precision]] and use integers rather than floats for coordinates in space and time. Along the way, try not to ever use the X,Y coordinates directly (and thus confuse myself about where W is) - always use wrapper functions. So I have things like the following, where ~IntMapPos is just struct { int32 x, y; }\n\nbool ~IsOnMap ( ~IntMapPos pos );\n~MapNode *~GetMapNode ( ~IntMapPos pos );\n~IntMapPos ~MoveBy ( ~IntMapPos pos, ~IntMapHeading Heading, ~MapSlope Slope );\n~IntMapHeading ~GetDeltaInfo ( float *pHowFar, int *pHowFarManhattan, float *pFloatHeading, ~IntMapPos p1, ~IntMapPos p2 );\n~D3DXVECTOR3 ~GetWorldDeltaFromMapDelta ( ~IntMapPos From, ~IntMapPos To );\n\n"Manhattan" distance is a funny concept on a hex grid, but it should be obvious what it means - distance in terms of moves. I don't know of any cities laid out in hexes, so Manhattan it stays.\n\nThe last function is the one that all the rendering uses - the camera data is stored as looking and rotating around a certain "target" map hex, and so I pass that target hex in as the "From" argument, and the result is a standard Cartesian float32 XYZ that everything uses for rendering data. Again, the idea is to make sure nothing else cares about this wacky hex grid, and that I don't get any precision problems when the map gets really big.\n\nOf course as you may have spotted from the above, having made my lovely 2D hex world with no corner cases, I then wanted to have a 3D (volumetric) world, and added in a Z axis, and unfortunately that means that in a vertical cross-section I have squares and that means I now have diagonal corner cases to deal with. Doh! In theory I think I could have done some sort of "orange-stacking" arrangement to offset adjacent Z layers, for example "face-centered-cubic" (which I think is the same as the Voronoi diagram of packed tetrahedrons?), but that would have made my head explode trying to think about the connectivity. So I think horizontal hexes and vertical alignment should work out as a reasonable compromise.
The game I'm most proud of working on. A "peep sim" very much in the vein of Theme Hospital and Dungeon Keeper (not much surprise really - people from both those teams joined [[MuckyFoot]] to work on StarTopia, and indeed later worked on Lionhead's "The Movies"). Lots of interesting graphics stuff to do there, and much later I even retrofitted shadowbuffers to it. More [[here|http://eelpi.gotdns.org/startopia/startopia.html]].
In a recent "what were you doing in 2001" Twitter meme I mentioned I had just shipped StarTopia then. And I got a ton of StarTopia love back, which is immensely gratifying, but I want to make something really clear - almost none of it is for me. I joined the project nine months before ship, and by then it was basically what you saw and loved. I just took over the graphics engine from Jan Svarovsky who could then focus on finishing the non-graphics stuff. I ported the engine from ~DX5 to ~DX7 (not exactly rocket-science), tarted it up with a bit of multitexture, and did all the graphics card compatibility stuff. The latter is actually a hard job, especially in 2001 when there were still tens of graphics card manufacturers each with multiple strange variants, but it's just something that needed to be done - it didn't really affect the game. I guess I did a few gameplay things here and there - we all did - I don't really remember the details. But yeah - the real heroes are all these great folks: http://www.mobygames.com/game/windows/startopia/credits with special kudos to Wayne Imlach who spent years "playing" and tweaking the Excel spreadsheet simulator with the millions of tweakable numbers in the game to make it the right blend of challenging but fair. I still enjoy playing the game from time to time - even when you know the magic behind it all.
''WORK IN PROGRESS'' - which is why it's not on the main page yet.\n\nThis is an attempt to gather together some of the knowledge I have about writing streaming resource systems. That is, game engines (rendering, sound, collision, etc) which do little or no bulk-loading of data in "levels", but can instead stream data as the player progresses.\n\nThis is mainly from my experience writing engines at [[MuckyFoot]], firstly with the game [[Blade II]] which shipped on ~PS2 and Xbox1, and then the engine used for the next two projects, neither of which were actually shipped, but were far enough along in production to pound out the problems with the system. It also contains the results of discussions with others about their streaming systems, and the problems they had with them. I suspect the success of my system has more to do with luck than planning, but the lessons are nevertheless useful!\n\nSome resource types, such as textures and to a lesser extent meshes were the space-hogs, and there were sophisticated systems to make sure we only loaded exactly what we needed at the LOD we needed.\n\n''Stuff I wrote on forums on the internets''\n\nOne day I'll arrange these into something coherent, but might be interesting even in this form.\n\n---\n\nIf you can see stuff streaming in, you're doing it wrong. It's not simple - good streaming requires pretty comprehensive support in the engine. I know from painful experience that it's difficult to slap one onto an engine retrospectively. You need aggressive and smart prefetching to hide seek times. You need good markup for things that might pop in suddenly (HUD elements, muzzle flashes and things associated with explosions such as particle system textures) to avoid paging out things that can be needed in a hurry. You need good fallbacks - mipmap levels or alternatives that are never unloaded. You especially need a lot of work with cutscenes and any other place with jump-cuts between cameras or game features like teleportation. You absolutely cannot just wait until the renderer needs the texture before you load it in - that looks like complete pants.\n\nI also think that just streaming one part of your assets, e.g. just textures or just sounds or just meshes, is doing most of the work for only a small part of the benefit. Stream everything! Textures, meshes, sounds, nav data and collision hulls all stream well, and the first three also LOD well.\n\nPeople have been streaming stuff for decades at least (at least I know I have!). People just never noticed before because they did it right I always meant to write an article or something about how to do it properly, but I never did, and I've probably forgotten half the important things. Ah well.\n\nTo answer the question - the big advantage is that when you do texture streaming right ([[Knowing which mipmap levels are needed]]), you suddenly don't care about the size of textures. Aside from running out of disk space (which OK, can still be a problem), it doesn't matter if the artists make textures a bit too big, or unevenly allocate texel space in the level - if the player never gets close enough to see, those mipmaps are never loaded. Think about the time the poor pixel-pushers spend fussing over exactly how big to make their textures and whether they have the budget - they get to spend all that time making stuff pretty instead.\n\n---\n\nIt's not simple or easy. Even smart engineers do it wrong. But many smart engineers have done it right, shipped games, and you never knew they were streaming because - they did it right! We have this sort of argument every few years with every approximation technology. Progressive meshes, texture streaming, impostors, SH lighting, etc - plenty of people do them well and nobody notices. It only takes a few to do a bodge'n'ship job and suddenly forums are full of "new technology X is a piece of shit - look at EngineY - they're using it and they look shit." Conclusion - don't do things shittily. I think we can agree on that?\n\nThe benefits on fine-grained streaming are significant. At a given visual quality you have a LOT more memory available (or at a fixed memory budget you have much higher visual quality - take your pick). That leads to far more flexibility about where you use that budget, and that means development times drop and people don't spend quite as much time obsessing over memory. It is true that this can lead to trying to stuff more in and making life hell all over again. Pro tip - don't do that.\n\nI think chunk-based streaming is the worst of both worlds. On the surface it sounds simpler than fine-grained streaming, but in practice it's not. You have all the management headaches of streaming data and prefetching in time to avoid pops, and teleports/cuts and all that headache. But now your chunks are big monolithic blobs, so you have the same detail for the thing two blocks away and around the corner as for the thing right in front of your nose, so you're still burning huge amounts of memory on stuff you can't see.\n\nWorse, in many systems the chunk has to contain everything that COULD be there, so if Cigar-Chomping Sergeant is in this level, his mesh, textures, sounds, etc need to be in every chunk in the level. So even if he's not in the scene, his stuff is loaded, and even worse if you're on boundaries between chunks it can be loaded three or four times.\n\nI've had very clever people tell me chunk-streaming is a good system, and while I respect their opinion on many things, I just cannot see the win. When we discuss the problems with streaming systems they all seem to be the same whether fine-grained or coarse, but coarse leaves so many of the advantages on the table.\n\nOr there's the hybrid - chunk-streaming loads in stuff up to a certain LOD, but then you have individual textures stream their top few mipmaps like fine-grained streaming. That one totally confuses me - you lose the two acknowledged benefits of chunks - simple memory management and predictability. And you're still capping the possible win by where you draw the line between chunk and fine-grained. Draw it at 512x512 and you're still loading a ton of stuff that isn't needed, draw it at 64x64 and now things can still look like blurry shit unless you do a really good fine-grained streaming system (in which case why not do it properly...).\n\n(technically the system I used is on that continuum since I always kept 8x8 copies of all textures around - it's 32 bytes which was part of the texture description, filename, etc)\n\n---\n\nAnother thing I see some games screw up is to take the view frustum into account when streaming. In almost every game style the camera can rotate far faster than a streaming system can keep up. So when doing calculations to see what's needed, it's important to just use visibility and distance, not view direction. It's also important to go past any visibility blockers that the player can get past in under 2 seconds, e.g. bends in corridors, unlocked doors, etc.\n\nIf you have something like a portal system, instead of doing the normal frustum-culling visibility test like you do for rendering, do a flood-fill out to a given movement distance (e.g. 3 seconds * max run speed). Then starting from that set of "potentially visible in 3 seconds" portals, do normal portal/portal clipping to do visibility culling, but ignore the viewers frustum.\n\nOn the other hand, do grant a priority boost to things that are visible. If the streaming is stressed and not able to load everything you want, you want the visible stuff loaded first. If it's not stressed (e.g. the player hasn't moved recently), you want it to still load everything even if it's off-screen.\n\n---\n\n''Outline for a book or something''\n\n_Resource framework_\n\nResources + priorities + LOD levels\nProd & decay\nDecay done with a timestamp & logarithmic decay. Preserves the idea of "twice as important as". Timestamp allows decay without constant checks.\nPriority queue\nFetch queue - 20 items long - allows optimisation of seek times. Once something is in fetch queue, it is loaded\nEvict queue - can't remove items immediately (GPU may still be using them).\nAvoiding constant PQ updates - stochastic updates instead.\nFragmentation\nDefragmentation\nChanging up a LOD level\nChanging down a LOD level\n\n_Setting priorities_\n\nScale by distance\nWalk through visible portals\nWalk through closed but unlocked doors\nWalk through locked doors with lower priority\nFlood-fill X portals after visible frustum (corridor problem)\nBreaking objects (suddenly need new textures, etc)\n\n_Cinematics_\n\nRecord & play back 5 seconds ahead\nAt decision points, have scripts add "future cameras" that are treated like real cameras with slightly lower priority.\n\n_Textures_\n\nJust-too-late\nTinytextures\nConsole mipmap streaming\nPC mipmap streaming\n\n_Sound_\n\nNo traditional LOD like meshes & textures.\nSpot effects - JTL - if not there, not played.\nBarks - JTL - if not there, played when loaded, unless later than 1 second.\nLonger music & speech - traditional buffered streaming with max-out priority bump at half-empty\n\n_Animations_\n\nIn Blade II, these didn't take that much memory, so were not a sophisticated system.\nDidn't have any LODs in our engine, but some libraries (e.g. Granny) do, so use them like texture LODs.\n\n_Meshes_\n\nPS2 did not have VIPM (too complex for available engineering time), so were either present or not.\nXbox1 used VIPM LOD with bumpmapping.\nAlmost every mesh in game had VIPM chain - most were automatically generated.\nFor simplicity, VIPM resources only had three LODs. Example for a character was min=150 tris, normal = 5k tris (same as PS2 version), max = 50k.\nNo equivalent of "tinytexture" that is always resident - meshes can't go much below 150 tris, and that's still too big (~6kbytes each)\nActual Xbox rendering was continuous LOD. No popping, finer granularity saved vertex throughput as well as for streaming.\nMeshes cause texture prods - they know what mipmap levels they need. Textures inherit modified priority of mesh.\n\n_Game objects_\n\nProd meshes with priorities.\nProd animations with sounds.\nPriority bump for player, enemy, important objects etc. Scenery and so on considered at normal priority.\nArtificial prod for alternative meshes (e.g. broken versions, "used" versions) when player nearby.\nProds for equippable objects, e.g. weapons, so when you change, you don't have empty hands for a second.\nProds for animations.\n\n_Collision objects & AIs_\n\nCollision objects have high priority. Range-bubble dependence from player - no portals.\nAIs also range-dependence. If they find a missing collision object, they freeze & request it with even higher priority.\nCertain important AIs have their own range bubbles.\nScripts can raise priorities, give certain AIs larger range bubbles, etc\nKeep raising priorities until there are no AI freezes. This data is usually small, and over-loading is usually not a problem.\n\n_Refinements_\n\nBetter PQ implementation. Maybe use buckets & only finely-sort the top bucket?\nMaybe try a hardware-like L1/L2 cache? L1 = resident, L2 = waiting to be resident.\nSeparating LOD from priority. Ideal LOD = normal priority, next one down = higher priority, next one up = low priority.\nMaybe always have two "objects" in LODdable objects - the one that is loaded, and the next higher one.\nAI LODs - use corser collision geometry, or have pre-authored movement tracks that they revert to (e.g. street pedestrians)\nTry low-rez sounds - maybe far-off sounds are fine to play back at higher compression levels? Drop to 22kHz, 11kHz, or more compressed MP3/Ogg levels?\nHave 1st second of sound in memory, as soon at starts playing, stream the rest? Many sounds <1second long - doesn't help them.\nFor sounds with multiple variants (e.g. footsteps, gunfire, etc) only have a few permanently loaded, and when they start playing, load the rest.\nSame for animations, e.g. 10 different walk anims. Or use generic walk anim until specific depressed-wounded-troll-with-limp anim has been loaded.
While I'm having a rant - strippers. STOP IT. Stop writing academic papers about generating the ultimate strips. It's all totally pointless - pretty much every bit of hardware has indexed primitives (except the PS2, which has a bunch of completely bonkers rules that are ''nothing'' like standard strips anyway). The ultimate stripper will get you one vertex per triangle. But even a very quick and dirty indexer will get you that, and good indexer (e.g. the one I put in [[Granny3D]]) will get close to 0.65 vertices per triangle for most meshes with a 16-entry FIFO vertex cache. The theoretical limit for a regular triangulated plane with an infinitely large vertex cache is 0.5 verts/tri (think of a regular 2D grid - there's twice as many triangles as vertices), so thats not too shabby.\n\nNow, it is true that once you have chosen your triangle ordering for optimal vertex-cache efficiency, you may then want to express them as indexed triangle strips. But the key is in the first word - //indexed//. You've gone to all that trouble to order your tris well - don't change that order! Given that order, you can still rotate the triangles to express things as short strips, especially when you have a strip-restart index (e.g. the newer consoles and DX10), and that is pretty efficient. Since vertices are so much larger than indices, and re-transforming them if they drop out of the cache is so expensive, it's worth spending a few extra indices to keep the vertex cache toasty warm. But generating those optimal strips is pathetically simple - precisely because you can't change the ordering of the triangles.\n\nAll those academics still writing papers on this - go find something more useful to spend your time on. We've had indexed primitive hardware for <$50 for almost a decade now. Welcome to the 21st century.
Now only secret, not //super// secret - it's [[Larrabee]].
When streaming textures from disk (see entries tagged as "Streaming"), there's no reason you must have the same format on disk as you do in memory. You're limited by the speed of the drive, so you can do some decompression there. This reduces disk footprint, which is nice, but it also reduces the distance the head has to seek between resources (because they take less disk space), which is very good. Obvious things to do are Zip-style lossless compression, but ~DXTn doesn't compress that well. So you'd want some slightly lossy compression (because hey - ~DXTn is pretty lossy already - what's a bit extra?), but people are having a tough time finding ways to start with a ~DXTn image and add further lossiness - ~DXTn's sort of compression is really unfriendly to that sort of stuff.\n\nSo the other way to do it is to start with the original image and use proper image-compression methods like JPEG, PNG, Bink, etc. Those get great compression ratios. The problem is that when you load them off disk, they decompress to something like 888, which chews video memory and can slow down rendering because it takes valuable bandwidth. We really want to compress to something smaller. It would also be very useful to be able to do procedural operations either on the CPU or GPU producing 888 data, and then compress the result.\n\nBut good ~DXTn compression is notoriously slow - the problem is searching for the two endpoint colours and the interpolation through 3D colour space - it becomes a really big searching problem. There have been some interesting technqiues - [["Real-Time YCoCg-DXT Compression" by J.M.P. van Waveren and Ignacio Castaño|http://developer.nvidia.com/object/real-time-ycocg-dxt-compression.html]] is a good round up, and also a nice exploration of some alternative encodings (store the Y channel in the alpha of ~DXT5, which de-correlates it from the the ~CoCg part which increases quality and speeds up searches).\n\nIt's a neat trick, but I went the other way. ~DXTn-style compression of 2D and 3D colour spaces is slow because the search space is large. But compressing 1D data has a far smaller search space and is much quicker. We can either use ~DXT1 with greyscale data, or in ~DX10 there is a ~BC4 format that stores single-channel data - it's basically just the alpha-channel of ~DXT5, without the colour data, so it's also 4 bits per texel.\n\nThe core of the idea is to take the image, shrink it by 4x in each direction, quantise it, and store it in a 565 format texture. Then scale that back up with bilinear filtering, and find the luminance of both it and the original image (i.e. greyscale them). Divide one luminance by the other - now you have a bunch of values that corrects the luminance of the low-rez 565 texture. Find a texture-wide scale&bias to let you store these luminances well - most of them are around 1.0, and we're only interested in the deviations from 1.0. Then store those as a full-size ~DXT1 or ~BC4.\n\nThe decompression shader code is very simple - sample both textures with the same U,V coords, scale&bias the luminance correction with globals fed in through pixel shader constants, and multiply with the result from the 565 channel.\n\nThe quality compared to standard ~DXT1 is pretty good. It takes 5 bits per texel (4 for the luminance correction channel, and 1 for the 565 texture, because it's shrunk by 16x). This is slightly larger than ~DXT1, but the visual quality seems to be very close, though I haven't done any PSNR studies. The average quality seems comparable, and the main difference is in highly-coloured areas. Here, ~DXT1 gives its distinctive block-edge errors, which are pretty objectionable. Using this technique instead blurs the colours out - essentially, you get the raw upsampled 565, with the luminance correction not doing all that much. While not significantly more correct than ~DXT1, it does not have any visible blocking, and so attracts the eye far less.\n\nI have also tried only shrinking the 565 by 2x in each direction. This obviously gives better fidelity, but increases the cost from 5 bits/texel to 8 bits/texel.\n\nIn the future I'd like to try converting to a luminance+chrominance space (e.g. ~YCbCr or ~YCoCg) and then just storing the luminance raw in the ~DXT1, and the chrominance in one half of a 4444 texture (i.e. either the RG or the BA channels). That way you can share the 4444 between two different textures and amortise the cost. It brings the 4x4 downsample to 4.5 bits per texel, or the 2x2 downsample to 6 bits, both of which are pretty acceptable. I'm a little curious about the quantisation there - is 4 bits enough for chrominance? You don't see quantisation problems on the current 565 texture because the luminance scaling corrects for it, but this would be a somewhat different colourspace.\n\nI'd also like to do something smarter with the ~DXT1 texture. Currently, I just store the luminance as RGB like you'd expect. But ~DXT1 has 565 end points, so you don't get true greys. Also, you could do some clever maths with dot-products to get higher precision. However, the thing to remember is that we do a scale&bias on this value - most of the errors are very close to 1.0. So in fact I suspect any extra precision and non-greyscale effects are insignificant in practice.\n\nThis also makes me wish we had some texture formats that were less than 4 bits per texel, even at the cost of accuracy. Those would be ideal for storing the luminance correction term. I guess you could use 332 to store three different luminance channels (2.7 bits per texel), but the logistics of sharing textures gets tricky.\n\nYou could also pack three luminance channels into a ~DXT1 - the problem is then you have to compress 3D data into ~DXT1, and avoiding that was the whole point of this exercise! However, if you didn't mind the higher compression cost, it might work quite well. The thing to realise is that most 4x4 blocks of luminance correction have very little data - most of the time, the upsampled 565 is very close to correct. It's only a minority of blocks that have correction data. So if you choose your three textures well (an exercise left to the reader :-), you can end up with very few blocks that have data in more than one of the 3 channels, and so still get good quality. Cost would be very small - it's 4 bits per texel ~DXT1, shared between 3 textures, and then 1 bits per texel for 4x4 shrink 565, for a total of 2.7 bits per texel!
Re-reading the [[Visibility and priority tests for streaming assets]] entry I realise I didn't actually go into why the SVT/Megatexture stuff wasn't all you need for good streaming. It's sort-of obvious, but it's worth spelling out for clarity.\n\nIf you look through the list of the problems I faced and the ways I fixed them, many of them illustrate why the SVT/Rage systems are fine, but aren't sufficient. They tell you what is visible right now, but as-is they give you no advance warning, and drive heads take significant time to seek. You could add some heuristics to prefetch adjacent 64x64 blocks, but it seems both simpler and more flexible to use game-specific knowledge to do that prefetching rather than trying to reverse-engineer from the visibility data.\n\nIn practice what id actually do is to have a two-level system. First they use methods just like the ones I talked about to stream texture data off disk into a large cache in memory. The difference is that this data is not directly usable by the graphics card, it's in a ~JPEG-like format, and it's also multi-channel (they have the diffuse, specular and normal maps all squashed together in a single blob). I seem to recall also that the chunks they load off disk are larger, e.g. if the virtual texture system pages on 64x64 granularity, the disk streaming happens on 256x256 blocks (sorry, I forget the actual sizes, and Google isn't helping). This gives the compression more data to work with and increases compression ratios.\n\nThen each frame they use the visibility data back from the card to decide which of these chunks need to be decompressed from the multi-channel DCT blocks to the multiple ~DXTn textures, and on the PC and ~PS3 these are then uploaded to video memory. This takes processor time, but it's relatively low-latency compared to a drive-head seek, so I'm not sure if they use any prefetching heuristics here at all. Using a custom compression format is a really interesting thing to and allows you to reduce drive bandwidth and maximise the amount of data you can store in available memory, which means your prefetching can do an even better job of coping with unpredictable actions.\n\nSean Barrett has an even more extreme version of compression. His presentation was also his demo, and each "slide" is simply a billboard in the world that the camera looks at, and then to do the demo he just pulls the camera back and shows you the rest of the world. The clever thing is the slides are not stored as bitmaps, they're just the actual text with markup. That is, when the SVT system says "I need the 64x64 chunk starting at 2048,192 in mipmap level 5" the system actually goes and renders the text vectors onto the bitmap right there and then, at whatever resolution it's asked for. It's difficult to beat that sort of compression ratio, and you could see it being used in real game examples for newspapers or shop signs or book spines in a library.\n\nSo the two systems (disk->sysmem and sysmem->vidmem) are largely orthogonal, and indeed you don't need the cunning visibility feedback or page-table-based virtual texturing methods to get some of the benefits. One thing I always wanted to try adding to the streaming system was this idea of the two-level cache. In Blade II we did do a quick'n'dirty zlib decompression of all data coming from the disk, but that happened the instant the data was loaded. What you'd want to do is only do that decompression when the texture was actually needed for rendering, but otherwise leave it in memory still compressed. ~DXTn will get around a 2:1 compression with zlib, so it's a worthwhile thing to do, and of course it's lossless. Mesh data compresses even better, and there's some very clever schemes for compressing things like index buffers. Another obvious trick is that normal maps are often rendered using ~DXT5 textures with the U channel in the green and the V channel in the alpha, with the red and blue channels unused (though this trick isn't necessary when you have the ~BCn formats available). Clearly the compressed version can discard those extra channels.\n\nAnd you can go much further and start generating texture on the fly - procedural generation, aging effects, dynamic scratches, bullet holes, etc. I wrote an article in Game Programming Gems 3 called "Unique Textures" about this that I should probably webify one day.
Louise Forsyth (nee Rutter) is my //darling// wife (dear, please stop peering over my shoulder). We're coming up to our 12th anniversary together, and just past our 2nd year married. Living in sin was such fun, but it's all over now. Conventional at last. She's meant to be a vet, but the US authoritays don't realise that UK cats and dogs are uncannily similar to US cats and dogs and want her to pass some more exams. When my mum told me that a Cambridge degree was the key that unlocked the world, I believed her. They fuck you up, your mum and dad :-)\n\nShe has a [[hiking blog|http://eelpi.livejournal.com/]] which has some gorgeous pictures of the surrounding countryside. Volcanos rock. Quite literally.
That's me - Tom Forsyth. I'm now pretty much permanently TomF though - even when I write emails to people like my parents or my wife. It started at MuckyFoot where I was the second Tom to join, and Tom Ireland had already nabbed tom@muckyfoot.com, so I had to make do with tomf@muckyfoot.com. However, Tom Ireland would get a constant stream of mail meant for me, so I started to make a point of always signing myself TomF in the hope that people would remember to add the "F". It only partly worked, but the habit's kinda stuck now. Of course then I worked with Tom Fletcher at Intel, so that put a spanner in the works. But there's no other TomF's at Valve, so we're back to normal.\n\nThe main URL for my site is http://eelpi.gotdns.org/ When you go there, it redirects to whatever ISP I am paying money to this year, so don't use whatever address you see in your browser bar - stick to this one, coz it will move around as I do.\n\nMore:\n* EmailMe.\n* My surname has NoE.\n* Where I work: [[Valve|http://www.valvesoftware.com/]].\n* What I work on: [[wearable computing|http://blogs.valvesoftware.com/abrash/valve-how-i-got-here-what-its-like-and-what-im-doing-2/]]\n\nDisambiguation. These are me:\n* [[@tom_forsyth|http://twitter.com/tom_forsyth]]\n* The Tom Forsyth who is on various game-related programming mailing lists.\n* The Tom Forsyth who has written and edited programming book articles.\n* The Tom Forsyth who has talked at GDC.\n* The Tom Forsyth who worked at [[MuckyFoot]], [[RadGameTools]] and [[Intel|http://www.intel.com]].\n* The Tom Forsyth who worked on [[Urban Chaos]], [[StarTopia]], [[Blade II]], [[Granny3D]] and [[Larrabee]].\n* The Tom Forsyth or Andrew T Forsyth on various patents to do with processor tech (through Larrabee).\n* The Tom Forsyth who was a Microsoft MVP.\n\nThese are NOT me:\n* The Tom Forsyth who works/worked for Nokia.\n* The [[Tom Forsythe|http://www.tomforsythe.com/]] who takes photos of naked Barbie dolls.\n* The [[Thomas Forsyth|http://www.thomasforsyth.co.uk]] who makes furniture.\n* The [[Tom Forsyth|http://en.wikipedia.org/wiki/Tom_Forsyth]] who scored the winning goal in the 1973 Scottish Cup Final.\n* I am not knowingly related to these or any other famous Forsyth or Forsythe (Frederick, Bruce, John, etc).
AKA TomF. http://eelpi.gotdns.org/
I was talking to someone about simple in-game lighting models the other day and we realised we had totally different terminology for a bunch of lighting tricks that we'd each independently invented. With my instinctive loathing for reinventing the wheel, I wondered why this had happened. And the reason is that some lighting tricks are so simple they don't warrant a proper paper or a book article or a GDC talk, so they never get passed along to others. So I thought I'd write up a light type I've been using for a while that I call a "trilight" that I've not seen anyone else talk about. It's very simple, and I'm sure others have invented it too, but at least now I'm happy I've documented it. It's on my [[papers|http://www.eelpi.gotdns.org/papers/papers.html]] page, and it's got a simple demo to go with it.\n\nBy the way, it's really satisfying to write a small demo and a short paper and do it all start to finish in half a day for a change. Normally I tackle big problems like shadowing, which take months to code and lots of explanation. Excellent way to spend a Sunday afternoon - more people should try it.
Supposedly from the very famous "Four Yorkshiremen Sketch" from Monty Python. Reproduced here in its entirety just because it makes for easier cut'n'pasting. TheWife will kill you if you describe her as a Yorkshireman, because Yorkshire is on the ''OTHER SIDE OF THE FUCKING HILL''. She, as any well-brought-up midlander will be able to instantly tell from her authentic Estuary/Fenland accent, is from Lancashire. But that's an amusing anecdote for another time. Meanwhile, pop the stack and on with the primary anecdote:\n\nThe Players:\n Michael Palin - First Yorkshireman;\n Graham Chapman - Second Yorkshireman;\n Terry Jones - Third Yorkshireman;\n Eric Idle - Fourth Yorkshireman;\n\nThe Scene:\n Four well-dressed men are sitting together at a vacation resort.\n 'Farewell to Thee' is played in the background on Hawaiian guitar. \n\nFIRST YORKSHIREMAN:\n Aye, very passable, that, very passable bit of risotto.\nSECOND YORKSHIREMAN:\n Nothing like a good glass of Château de Chasselas, eh, Josiah?\nTHIRD YORKSHIREMAN:\n You're right there, Obadiah.\nFOURTH YORKSHIREMAN:\n Who'd have thought thirty year ago we'd all be sittin' here drinking Château de Chasselas, eh?\nFIRST YORKSHIREMAN:\n In them days we was glad to have the price of a cup o' tea.\nSECOND YORKSHIREMAN:\n A cup o' cold tea.\nFOURTH YORKSHIREMAN:\n Without milk or sugar.\nTHIRD YORKSHIREMAN:\n Or tea.\nFIRST YORKSHIREMAN:\n In a cracked cup, an' all.\nFOURTH YORKSHIREMAN:\n Oh, we never had a cup. We used to have to drink out of a rolled up newspaper.\nSECOND YORKSHIREMAN:\n The best we could manage was to suck on a piece of damp cloth.\nTHIRD YORKSHIREMAN:\n But you know, we were happy in those days, though we were poor.\nFIRST YORKSHIREMAN:\n Because we were poor. My old Dad used to say to me, "Money doesn't buy you happiness, son".\nFOURTH YORKSHIREMAN:\n Aye, 'e was right.\nFIRST YORKSHIREMAN:\n Aye, 'e was.\nFOURTH YORKSHIREMAN:\n I was happier then and I had nothin'. We used to live in this tiny old house with great big holes in the roof.\nSECOND YORKSHIREMAN:\n House! You were lucky to live in a house! We used to live in one room, all twenty-six of us, no furniture, 'alf the floor was missing, and we were all 'uddled together in one corner for fear of falling.\nTHIRD YORKSHIREMAN:\n Eh, you were lucky to have a room! We used to have to live in t' corridor!\nFIRST YORKSHIREMAN:\n Oh, we used to dream of livin' in a corridor! Would ha' been a palace to us. We used to live in an old water tank on a rubbish tip. We got woke up every morning by having a load of rotting fish dumped all over us! House? Huh.\nFOURTH YORKSHIREMAN:\n Well, when I say 'house' it was only a hole in the ground covered by a sheet of tarpaulin, but it was a house to us.\nSECOND YORKSHIREMAN:\n We were evicted from our 'ole in the ground; we 'ad to go and live in a lake.\nTHIRD YORKSHIREMAN:\n You were lucky to have a lake! There were a hundred and fifty of us living in t' shoebox in t' middle o' road.\nFIRST YORKSHIREMAN:\n Cardboard box?\nTHIRD YORKSHIREMAN:\n Aye.\nFIRST YORKSHIREMAN:\n You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t' mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi' his belt.\nSECOND YORKSHIREMAN:\n Luxury. We used to have to get out of the lake at six o'clock in the morning, clean the lake, eat a handful of 'ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!\nTHIRD YORKSHIREMAN:\n Well, of course, we had it tough. We used to 'ave to get up out of shoebox at twelve o'clock at night and lick road clean wit' tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit' bread knife.\nFOURTH YORKSHIREMAN:\n Right. I had to get up in the morning at ten o'clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.\nFIRST YORKSHIREMAN:\n And you try and tell the young people of today that ..... they won't believe you.\nALL:\n They won't!\n\n\n[[Geek culture|http://apple.slashdot.org/comments.pl?sid=138720&cid=11609478]] - fantastic!\n\nThe attentive will have noticed that the phrase "uphill both ways" does not in fact appear in this sketch. That's because a fairly cursory bit of research will reveal that it is from a fairly similar but entirely unrelated Bill Cosby sketch. Bill Cosby is not a native Yorkshireman. Not a lot of people know that.
Released on PC, ~PS1, Dreamcast. I joined [[MuckyFoot]] just before the PC version was finished, so I didn't have much to do with the game itself, but it's got a lot of neat stuff in it - I enjoyed playing it. I did the Dreamcast port after the PC version had shipped - just me and Karl Zielinski testing in nine months start to finish. The Dreamcast was a really nice machine to work with - it just did what it was meant to without any fuss - a big contrast to the drama the ~PS1 version was going through. It's a shame the DC died - it was a nice elegant bit of kit.
The [[Utah Teapotahedron|http://www.ics.uci.edu/~arvo/images.html]] is an awesome demo model. It's easily recognisable, so it's easy to tell when you've screwed up a scaling or are clamping N.L the wrong way or managed to get your backface culling wrong. It's got insta-create functions in 3DSMax and D3DX, and the spline data is easily available (see [[Steve Baker's excellent homage page|http://www.sjbaker.org/teapot/]]). It has areas of all types of curvature, it can project shadows onto itself really nicely (handle, spout, lid handle). And it's easy to throw a texture map onto it - apart from the pole on the lid, it's easy to parameterise.\n\nAlso, it helps a pet gripe of mine - algorithms that require non-intersecting watertight manifolds. The teapot is not closed and it self-intersects (the handle punches through the side). So it's really useful for pounding on algorithms that don't like either of those. Getting artists to generate watertight manifolds is a horrible job - please could all the researchers out there stop developing algorithms that require it? If your algorithm doesn't even handle the sixth platonic solid, it's not much use to anyone, is it?\n\nFrankly, the only reason [[Lenna|http://www.lenna.org/]] is a better icon for computer graphics is that she's been around longer, and she's naked.
I keep having this stupid conversation with people. It goes a little something like this:\n\nThem: I need to get a new card - one with dual-DVI. Any ideas?\nMe: You really need dual DVI?\nThem: Yeah, VGA's shit.\nMe: VGA's actually pretty good you know - are you running at some sort of crazy res?\nThem: Yeah - 1600x1200 - it's a blurry mess.\nMe: That's not crazy at all. It's well within VGA's capabilities.\nThem: No, analogue is crap for anything like that. You have to go digital. Especially if you're reading text.\nMe: Dude, at home I run an Iiyama 21-inch CRT at 1600x1200 at 85Hz on a VGA cable. I can write code all day with black text on white backgrounds. At work I run my second Samsung 213T off a VGA cable as well, and that's the screen I use for email - black on white again. They're both crystal-sharp.\nThem: Rubbish. I just tried it myself - it's an unreadable blurry mess at even 60Hz.\nMe: Are you by any chance ... using an nVidia graphics card?\nThem: Sure, but what's that go to do with it? Have you got some ninja bastard card?\nMe: An elderly and perfectly standard Radeon 9700.\nThem: I've got a 7800 GT - it should kick the shit out of that.\nMe: Yes, at shuffling pixels. But it's got an nVidia RAMDAC. Which is a large steaming pile of poo.\n\nSeriously - what the hell is up with nVidia and their RAMDACs? They've been shit right from day one in the NV1, they were shit when I worked at 3Dlabs and the $50 Permedia2 had an infinitely superior display quality to their top-end GeForce2s, and they've continued that grand tradition right up to current cards. That was acceptable when a RivaTNT cost $50, but now they're asking $1000 for an SLI rig. My boss was trying to get two monitors hooked up to a fancy nVidia card that only had one DVI port on it, and whichever monitor he plugged into the VGA port was ghosting like crazy. Swap the card out for a cheap no-frills Radeon X300 and hey - lovely picture on both.\n\nNow you're going to think I'm an ATI fanboi. And I am, because I like elegant orthogonal hardware. But I'm not syaing ATI RAMDACs are great - it's just that they don't suck. Matrox, 3Dlabs and Intel all have decent RAMDACs. Even the S3 and PowerVR zombies have better RAMDACs. Beaten by S3! That's absurd.\n\nFor example, a colleague had a high-spec "desktop replacement" laptop with an nVidia chipset of some sort that he could never get to drive his cheap CRT with half-readable text. Naturally he blamed the monitor. He's recently replaced it with a new Dell 700m, and it drives the CRT wonderfully. This is a $5 Intel graphics chip in a laptop! It's totally worthless as a 3D card, but even it does orders of magnitude better than the nVidia cards at running a CRT.\n\nThe one time nVidia cards have decent RAMDACs is when it's by someone else. Some of their "multi media" cards with the fancy TV in/out stuff have a nice external RAMDAC made by someone else, and apparently (never tried them myself) they work just fine. I'm all for new tech, but we've all been bounced into switching to DVI has for such bogus reasons - monitor sizes just aren't growing that fast.\n\nSo if someone tells you that old steam-powered analogue VGA is totally obsolete because DVI quality is just sooooo much better, ask them if they've got an nVidia graphics card.
I've added a version of my VIPM comparison article to my website in the [[papers section|http://www.eelpi.gotdns.org/papers/papers.html]] down the bottom. It's always nice to have articles "alive" and searchable by engines. I added some hindsights about the process of integrating VIPM into Blade, since the article was written before that had been done.
*** Article under construction ***\n\n\nI've noticed in a lot of discussion the trouble with naming these various technologies. Here's the words we in the Wearable Computing group use:\n\n~Head-Mounted Display (HMD): general term for any sort of display that is fairly rigidly mounted on your head. May be one eye or both eyes, and use a wide variety of display technologies. Usually has some form of orientation and/or position tracking so that it knows where you're looking. Without the tracking it's really just a "private TV" - let's ignore those for the moment.\n\nVirtual Reality (VR): You are immersed in the virtual world and you really don't want to see the real world - it would be distracting. The exception to this is what we call "not being blind" - if the phone rings, or you want to pause to drink your cup of coffee or the cat suddenly jumps into your lap - then you want some instant and easy way to be able to see the world, but you probably don't need to also see the virtual one, so something as crude as a flip-up visor (as on the ancient Forte ~VFX1) would work fine.\n\nAugmented Reality (AR): Like VR, you can see a virtual world in a head-mounted display, but the displays are constructed so that they let in light from the real world as well - usually through half-silvered mirrors or similar. This allows virtual information or objects to be displayed on top of the real world. As [[Michael Abrash said in his blog|http://blogs.valvesoftware.com/abrash/why-you-wont-see-hard-ar-anytime-soon/]], the biggest problem here is that you cannot occlude the real world - there's no opacity controls.\n\nMediated Reality (MR): Similar to AR, except that instead of real-world photons travelling direct to your eyeball, a camera turns them into a video stream, the virtual world gets rendered over the top, and the result displayed to your eyeballs. You're still wearing an HMD and immersed in the world. This is significantly simpler than AR because the registration can be better and you can do opacity, but the question is whether the extra latency and resolution loss of going through the camera->display path is acceptable for everyday use. I wouldn't want to cross the street wearing one, let alone drive a car. However this is an obvious way to modify VR to "not being blind", since it's fairly easy to slap a camera onto a VR headset.\n\nAugmented Video (AV): Not a standard term, but it's the one we like. This is like Mediated Reality in that you have a video feed and you add something over the top and then show it to the user, but the difference is that it's not rigidly mounted to your head, and it may only be showing a portion of the view, not all of it. There's a ton of this happening right now on smart phones, and it's really cool. Buuuuut... I'm going to be a curmudgeon and complain and say it's not "real VR" - it's orders of magnitude easier than the immersive definition of VR above. So calling it "VR" gives people false hopes - "hey we have VR on an iPhone - can't be long before it's everywhere". Maybe in a few years we'll have to give up and call this VR and the definition above will need a new name - Immersive Virtual Reality maybe?
Woohoo! Finally got off my arse and wrote a paper without some editor nagging me that it was two weeks overdue!\n\nIn my day job doing [[Granny3D]], we have a mesh preprocessor, and I was looking for a vertex-cache optimiser to add to it. But the only ones I knew of were either too complex for me to understand, too special-case, or proprietary. I'd written a vertex-cache optimiser thing with lookahead before, and it was mind-numbingly slow. Truly, awfully slow. Even with a lookahead of only six triangles, it was pathetically slow (I tried seven - it didn't finish after three hours running time). So I thought well, I'm not doing one of those again.\n\nSo I tried just writing a no-lookahead, no-backtrack one, because anything is better than [[a slap in the face with an angry herring|Strippers]], and I basically made up some heuristics that I thought would work. And it behaved scarily well. So I tweaked it a bit, and it went better. And then I wrote a thing to anneal the "magic constants", and it went really really well. So well, it got within ten percent of the optimal result. Then I compared it to the variety of scary (but excellent) papers I'd read and it was really close to them. I haven't done an apples-to-apples comparison yet, but the eyeballed mammoths-to-heffalumps comparo looked rather promising. And it's orders of magnitude faster, and way easier to understand. And then [[Eric Haines|http://www.acm.org/tog/editors/erich/]] asked a question about this on the ~GDAlgo list, and that was too much for me to stand, so I wrote the thing up and put it [[here|http://www.eelpi.gotdns.org/papers/fast_vert_cache_opt.html]]. I intend to keep it relatively "live", so if you have any feedback, fire away.
When streaming data from disk, it's important to figure out two things that are related but slightly different - which assets you'd like to have loaded, and at what priority. Priorities are used to decide which assets get loaded when memory is short and you can't load everything, which ones to throw away to free up space, and they're also used to decide what order to load things in when you don't have enough disk bandwidth. And as we all know, you never have enough memory or disk bandwidth.\n\nFor textures with mipmaps, determining which mipmap you need is reasonably efficient to do as an approximation. I cover that in the blog entry [[Knowing which mipmap levels are needed]]. You could also do it by demand-paging with virtual textures, for example Sean's [[Sparse Virtual Textures|http://silverspaceship.com/src/svt/]] or id's Rage engine ("megatextures"), but there's problems with those which I will get to at the end.\n\nYou can do a similar sort of cheap approximation with mesh ~LODs. If you use progressive meshes the collapse error metric gives you an error in real-world distances. If you use artist-generated ~LODs they usually specify the screen height in pixels that each model is designed to be viewed at.\n\nSound ~LODs are trickier, as the only thing that fades with distance is the volume, which does not correspond to quality in an obvious way. There's a minimum-volume cutoff, but that's not really the same thing. In general you will need some sort of "obnoxiousness level" specified by the sound folks for sound ~LODs. Only having 2 footsteps loaded instead of 10 may be a low-obnoxious change, but having 2 enemy grunts instead of 10 will be significantly more obnoxious.\n\nAfter much fiddling, I decided to have two different priorities for each asset.\n\nOne is the priority for the /currently visible/ LOD (whether or not it's loaded!) - this is a measure of how bad stuff will look right now if you fail to load this. So it's something like a pixel-error metric. However those are tricky to measure and ballpark, so a more practical unit is "meters away from the standard game camera". It's much easier to empirically tweak those, because you just display both ~LODs side by side and move the camera to the point where you don't really notice the difference in quality. Biasing textures for faces and logos up, and textures for dirt and concrete down is a valid thing to do and works well. If I were as smart as [[Charles|http://cbloomrants.blogspot.com/]] I'd automate it and find the PSNR difference between adjacent mipmap levels and use that to bias the priority.\n\nThe other priority is the one for the next highest LOD to the visible one - the one you don't yet need. This one can be expressed different ways but for me it's "how many seconds away is the player from noticing this is missing" (or rather, the inverse - and then multiply by the currently-visible LOD's priority of course). If you have everything that is actually visible loaded, you then want to prefetch assets to fill the free space with available. The stuff you want to load first is the stuff that the player will be able to see soonest. Normally you get this time by dividing the distance of the object from the player by the speed the player can move at. However, if there's a locked door in the way, and the unlock-door animation takes two seconds to play, you can add that on to the simple distance measurement (as long as the door is still locked). Until the player is actually at the door to start unlocking it (or maybe one second away from the door), there's no point loading anything on the other side, no matter how close it is.\n\nSo each asset has two entries in the priority queue - one that answers the question "If I'm running out of memory, what should I unload to make room." and the other answers the question "I have spare space - what shall I fill it with". The two are slightly different.\n\nNote that for textures you can ''prove'' you do not need a certain mipmap level on a given frame - there is literally no point loading it, the graphics card will not touch that data. For mesh LOD, you can always theoretically see the improvement by using the higher LOD. But the "seconds before you can see the difference" blurs that distinction anyway. In practice every type of priority gets a bunch of tweak knobs applied to it and you'll spend a while tweaking them until "it looks right".\n\nThe other interesting question is what you measure distance in. Simplest is linear distance, but that ignores visibility - if something is on the other side of a wall, it'll get loaded anyway, which is suboptimal. But at the other end of the scale are the virtual texture systems. In theory they're great - they load only the parts of textures that are literally visible. Unfortunately, streaming off disk needs a few seconds of prefetch, and they're useless at providing that data. So they're great at low-latency stuff like decompressing ~JPEG->~DXTn and uploading to a video card, but not so good for deciding what to load off DVD.\n\n\nA bunch of things I discovered along the way, and how I solved them:\n\n''1. The player is camping an exit. But a baddie has been smarter and sneaks up behind them. The player hears their footsteps, turns around and... attack of the blurry low-poly monster!''\n\nDon't use the visible frustum to cull, just use omnidirectional distance (or distance-to-reach) to determine what ~LODs you need. Almost all game types can spin the camera around in less time than the drive head can seek.\n\nThe exception to this is textures that should be loaded but aren't. In this case, the streaming system has fallen behind or is running short of memory, you don't/can't have what you really want, and given the choice of loading texture A that is visible right now and texture B that is just as far away but isn't visible right now, load texture A first. This is one reason why I separate priorities into "what do I need now" and "what might I need in the future" - you bias the first by frustum, but not the second.\n\n''2. The player is running down a corridor, turns sharp left, and even stuff nearby is blurry.''\n\nUsing a PVS or portal system is great, but you need to modify the traversal when using it for streaming visibility. The way I did it was to first flood-fill out from the player by the time-to-get-there metric (e.g. 3 seconds of travel time, unlocking doors, etc), completely ignoring line-of-sight. Then line-of-sight check through each of the portals in the enlarged visible set, but again ignoring the view frustum (see problem #1). This means you pre-load visible things on the other side of currently-closed doors, around corners the player is close to, etc.\n\n''3. The player blows a door open with a rocket launcher, and...''\n\nIf anybody has property-destroying weapons readied, assume destructible things aren't there when doing visibility traverses.\n\n''4. The player shoots a car, the car seems to explode, but for a few seconds either the car looks undamaged, or the car simply isn't there.''\n\nYou need to "link" the destroyed-car-mesh (and textures) with the intact car mesh. If one is visible, make sure you assign a similar priority to the other. You could make this conditional on someone having a heavy weapon equipped if you were feeling keen.\n\n''5. The player lobs a grenade at something, and the explosion is blurry. The explosion texture loads just about the time the explosion is fading, then it immediately gets unloaded again - doh!''\n\nThe moment someone throws a grenade, start loading the explosion texture because you have about three seconds before you'll need them. Similarly, when someone pulls out any weapon, make sure you preload all the smoke & muzzle flash stuff associated with that weapon.\n\n''6. The HUD is sometimes blurry, e.g. the pause menu, the rarely-used "super power up" effect, etc.''\n\nNever unload any element of the in-game HUD. I tried a bunch of ways to stream the HUD, because it can take up quite a lot of texture space, but they all sucked because it's so incredibly obvious when you get it wrong. Just don't bother.\n\n''7. A cutscene where two characters are talking by radio, the camera cutting back and forth between them, every time it cuts the textures are blurry.''\n\nThis is really difficult, as is any game with teleportation or similarly rapid transport. You need to be able to throw multiple cameras into a scene and have them all do vis & priority checks, and take the highest priority of any of them. For cutscenes you need to have a camera playing the cutscene several seconds ahead of where the player is seeing it. Cutscene systems that work by modifying the game world will really hate having two cutscenes playing at the same time. The right answer is "don't do that" - make sure you have the ability to have multiple worlds with a cutscene running in each. But of course that's frequently absurdly impractical and would take major re-engineering.\n\nThe way I did it on Blade II was pretty hacky, but did work. I had a compile-time switch that recorded the camera position when cutscenes were playing and streamed them to a file. Then with a second compile-time switch it would play back that file with a three second lead time, doing vis & priority checks from that prescient camera. This gets the scenery right, but for example if a car is spawned and speeds towards the stationary camera it won't work (because the camera is in the right place in the future, but the car isn't). So this second mode remembers the filenames of what didn't get streamed in time and their max ~LODs at each point in time, and dumps that to a second file. Merge the two files and then in the normal build you play back that file three seconds ahead. Obviously you have to redo this process every time assets or cut-scenes change. You do still need to record the camera, because you may have ~NPCs or environment that can change dynamically (e.g. looking at the devastation the player has wrought behind them). This is a horrible hack, but it worked well enough to ship!
It's a Wiki. Sort of. A Wiki is a collaborative document with really simple editing and hyperlinking so that even idiots like me can create documents with lots of confusing links to random pages. The simplest way to make a new link is to WriteInCamelCapsLikeThis, which is why some of the words seem really stupid. Just learn to live with it - the less time I have to spend thinking about formatting, the more likely I am to deign to write down my peals of wisdom, and the more likely you are to be raised to enlightenment. You lucky people.\n\nQuick guide:\n* click on blue links to open new topics (technically called "tiddlers").\n* hover over open topics to get a little toolbar in the top right with a list of actions.\n* click on "close" to close the current topic.\n* click on "close others" to close everything but this topic.\n* some pages have tags which categorise them. This one has the tag "help". You can select the "Tags" field on the far right and see a list of all the tags, and browse the topics that use those tags. Handy if all you want is tech stuff and not a lot of the random ramblings that go with them.\n* experiment - there's no way you can harm this Wiki by playing - even if it looks like you can (because you can't save your changes).\n\nNote it's only sort-of a Wiki, because although it looks like you can edit it, you can't save your edits. It lives on my server, so only I can edit it, so ner! Obviously you can save them to your hard-drive, but that's unlikely to be that interesting - nobody else will see them. If you want to play with real Wikis, there's the original WikiWikiWeb (http://c2.com/cgi/wiki) from which all are Wikis ultimately descended, and probably the most famous is the Wikipedia (http://en.wikipedia.org/wiki/Main_Page), which is an Encyclopedia that is a Wiki - created for geeks by geeks, and thus contains far more about about Pokemon characters (http://en.wikipedia.org/wiki/Jigglypuff) than it does on ball gowns (http://en.wikipedia.org/wiki/Ballgown).\n\nFor the technically-minded, this particular breed of Wiki is a TiddlyWiki, and you can read all about it at http://www.tiddlywiki.com/
I didn't say the pages contained anything useful. Move your mouse over this text - see how there's a line of text starting with "close" to the upper left? Click the word.