Man - this blog isn't even a day old and I'm already getting shit about the information overload caused by using sophisticated new Wiki technology. One doesn't like that he can't read it on his Lunix box, another moans that there's too much stuff they can click on and can't I just use a text file?\n\nNo dammit! I like snazzy new gratuitous tech. This blog is going to be mainly about snazzy new gratuitous tech (with the occasional UphillBothWays nostalgia moan added for good measure). And since this is a thing as totally meaningless and self-centered as a blog, I'm going to use it like I stole it. Er... which I did (thanks [[Jeremy|http://www.tiddlywiki.com/]]). It's not like I'm demanding you use MS tech or anything hentai like that.\n\nEr... news? No, I have none of that. Except that I went to see my accountant today and he said I owed The Man five digits. OK, so as an immigrant Brit living in the US seeking asylum from crappy pay and on a "you can stay if you don't cast nasturtiums on Mr. POTUS" visa making money for US companies and spending all my hard-earned cash on other US companies, would it be slightly impolitic to utter the phrase "No taxation without representation?" Yeah, thought not. Even after I get my green card, I have to tell HRH Brenda to shove it before I get to cast a pointless vote in my local United Franchise of America. Gonna book me a plane ticket to Boston, gonna tip a crate of your shitty bourbon into the harbour. Yes, "harbour". It's got a "u" in it. Blow me. Where's my Laphroaig, [[wench|TheWife]]?\n\nTo be fair, it's all because I got a mildly obscene Christmas bonus on account of the SuperSecretProject. You so want my boss. But you can't have him. He's mine. All mine!
Double precision - it's not magic. But people keep invoking it like it is.\n\nFloating-point numbers are brilliant - they have decent precision and a large range. But they still only have 32 bits, so there's still only 4billion different ones (actually there's fewer than that if you ignore all the varieties of NANs, assume infs are basically bugs, and turn off denormals because they're agonisingly slow). So they have tradeoffs and weaknesses just like everything else. You need to know what they are, and what happens for example when you subtract one small number from another - you get imprecision. One "solution" is to turn on double precision, because then you get more bits. Yeah, but you've just shuffled the problem around. Doubles have exactly the same weaknesses - all you've done is shuffle the problems into a corner, stuck your fingers in your ears and yelled "lalalalalala". They'll still come back to bite you, and because it's double precision, it'll be even rarer and even harder to track down. And they're slower on most machines, so you've hurt your execution speed.\n\nThe right thing to do is go read a book on numerical precision. The basics are - any time you do a subtract (or an add, implicitly), consider what happens when the two numbers are very close together, e.g. 1.0000011 - 1.0. The result of this is going to be roughly 0.0000011 of course, but the "roughly" is going to be pretty rough. In general you only get about six-and-a-bit decimal digits of precision from floats (2^23 is 8388608), so the problem is that 1.0000011 isn't very precise - it could be anywhere between 1.0000012 or 1.0000010. So the result of the subtraction is anywhere between 1.2*10^-6 and 1.0*10^-6. That's not very impressive precision, having the second digit be wrong! So you need to refactor your algorithms to fix this.\n\nThe most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation.\n\nBy the way, some rendering code doesn't even do that subtraction. Canonically, you bake that subtraction into the offset of your "camera matrix" in your rendering code and let the GPU do it. This is madness - GPUs have even less precision than full IEEE754 floating-point, and you're doing a whole bunch of vector math before you finally do the subtraction. So your precision problems are going to be a lot worse. Best all round if you just subtract camera position from object position on the CPU before it gets anywhere near a GPU.\n\nThe solution is obvious - if single precision isn't enough, switch to storing your positions in double precision! Well er yeah, that kinda works. Mostly. But there's some problems. The most obvious is that some machines simply don't do doubles (PS2), and on machines that do support doubles, there's usually speed penalties.\n\nSo what's the alternative? Well, good old fixed-point numbers. For storing positions, they're great. You only need to subtract positions from each other, and add velocities to positions. You almost never need to do crazy stuff like multiply two positions together or anything - that doesn't even "mean" anything. 64-bit fixed-point integers are supported on pretty much every platform. Even if they're not natively supported, they're easily emulated with things like add-with-carry instructions (and those platforms will have awful float64 support anyway).\n\nPlus they have the big advantage that you actually get all 64 bits of precision. A float64 only has 52 bits of precision. But I hear you say - it's not constant precision - floating-poiint numbers have adaptive precision. Yes they do - and that's actually a problem. The problem is that your code works differently depending on your distance from an arbitrarily-chosen point in space - the origin.\n\nThis has huge implications for robustness and testability. If you're like me, when you develop code, you do so in a tiny sandbox game level so you don't have to spend five minutes loading the assets for a full level. And yay - everything works fine. The physics works, the gameplay movement works, there's no falling-out-of-maps stuff, everything is smooth. So you check it in and then several weeks later someone says there's some shitty stuff happening in level X. So you test level X and it seems fine, and then you have the "works on my machine" argument with the tester, and it takes you several days to figure out that it only happens in one part of level X, and that's the part where you're a long way from the origin and some part of the physics code is subtracting two large numbers and getting an imprecise result.\n\nThe nice thing about fixed point is it's consistent. You get the same precision everywhere. If your time step and physics epsilons work at the origin, they work everywhere. And before you moan that fixed-point doesn't have the range - 32 bits of fixed-point gets you anywhere on Earth to about 3mm. Not enough? Well 64 bits of precision gets you to the furthest distance of Pluto from the Sun (7.4 billion km) with sub-micrometer precision. And it's all consistent precision - no lumpy parts or fine parts or places where you suddenly get denormals creeping in and your performance goes through the floor for no well-identifiable reason.\n\nI think the easiest way to encapsualte this (if you're an OOP fan) is to make a "position" class that holds your chosen respresentation, and the only operations you can perform on that class are subtracting one from another to give a standard float32 vec3, and adding a float32 vec3 to a position to get another position. So it means to can't really use a position without doing calculations relative to another position. You should find that that is all you actually need. If not, then it's time to refactor some code, because you're likely to have problems when doing things at different dstances from the origin.\n\nThe one thing I have seen is people making BSP trees out of the entire world, and then storing the planes as A,B,C,D where: Ax+By+Cz+D>0. Which is fine, except that's also going to have precision problems.And it's going to have precision problems far earlier, because x,y,z are another object's position. And now you're multiplying two positions together and subtracting a bunch of them and asking if they're positive or negative and the problem with that is that you will halve your available precision. So even doubles will only get you 52/2 = 26 bits of precision, which is rubbish. And experience with BSP trees has shown that they're extremely intolerant of precision errors. So yeah - that sort of stuff just makes me wince. You really do need to store a point on the plane and the plane's normal. Otherwise even decently-size levels are going to end up in agony (Michael Abrash has a story exactly like this about Quake 1 - they had tiny tiny levels and they had problems!)\n\nSo I had a point several decades ago, and that was that I have a nice compact OffendOMatic rule: any time you use doubles in a game, you haven't understood the algorithm.
Someone prodded me about this the other day, so I thought I should get on and do it. I gave a GDC 2003 talk about SH, but I was never really happy with it - half an hour isn't really enough to cover it well. I haven't changed the slides, but now I have some notes for it. I corrected an error, finally wrote those elusive fConstX values down, but mainly I talk about some surrounding details about the console implementation I actually used it in - what is probably the coolest bit of all - or at least it's the bit the artists really liked about the new system. [[Spherical Harmonics in Actual Games notes]]
The rest have been removed from the front page to save space. Here are all my blog entries in chronological order:\n\n[[GDC 09]]\n[[How to walk better]]\n[[Larrabee ISA unveiled at GDC 2009]]\n[[CAs in cloud formation]]\n[[Regular mesh vertex cache ordering]]\n[[Siggraph Asia 2008]]\n[[Plague]]\n[[Siggraph 2008]]\n[[ShaderX2 available for free]]\n[[Larrabee and raytracing]]\n[[Renderstate change costs]]\n[[Larrabee decloak]]\n[[Blog linkage]]\n[[Smart coder priorities]]\n[[Rasteriser vs Raytracer]]\n[[Texture formats for faster compression]]\n[[Knowing which mipmap levels are needed]]\n[[SSE]]\n[[Patently evil]]\n[[Trilights]]\n[[Bitangents]]\n[[More vertex cache optimisation]]\n[[Reinventing the wheel - again!]]\n[[Shadowbuffers and shaders]]\n[[Utah Teapot]]\n[[GDC survival guide]]\n[[Pixomatic slides]]\n[[Notepad++]]\n[[More on SH]]\n[[Licenses]]\n[[Added some hindsights and notes on Spherical Harmonics]]\n[[Cellular Automata article added]]\n[[Impostor article added]]\n[[Shadowmap vs shadowbuffer]]\n[[Vertex Cache Optimisation]]\n[[Strippers]]\n[[Scene Graphs - just say no]]\n[[Dodgy demos]]\n[[Premultiplied alpha]]\n[[Babbage was a true genius]]\n[[Someone cited my VIPM article]]\n[[VIPM article in HTML]]\n[[RSS part 3]]\n[[VGA was good enough for my grandfather]]\n[[RSS part 2]]\n[[A matter of precision]]\n[[Game Middleware list]]\n[[RSS banditry]]\n[[...there was a load of words]]\n[[In the beginning...]]
I've just been reading up on the Analytical Engine. I've known the background to Babbage's life and works for ages, and I knew he was incredibly clever - the Difference Engine is an amazing feat when you consider it's made of mechanical gears and powered by a human cranking on a lever. But fundamentally all it does is a bunch of parallel additions. For the time, it would have done them extremely fast and have had awesome reliability, but the actual computations were perfectly well-understood. By the way, if you're ever in London, you have to go visit the Science Museum in South Kensington and see the [[Difference Engine Mk2|http://en.wikipedia.org/wiki/Difference_engine]] that they built. It's truly fascinating watching it work, and work well. The thing I didn't realise until I read a bit more about it is that it's actually built to achievable 19th-century tolerances, which means Babbage actually could have built the whole thing, and it would have worked convincingly and usefully. His problems were political and financial rather than mechanical, and it didn't help that he was a pompous jerk (like many geniuses).\n\nBut again, perfectly well-understood maths in the Difference Engine. Couldn't do anything revolutionary, just would have done it far better than the existing mechanisms (a room full of people adding stuff manually). No, the real genius came with the Analytical Engine. Again, I've always known it was the first programmable computer, but when people say that - you always imagine that well yes, it was slightly smarter than a [[Jacquard Loom|http://en.wikipedia.org/wiki/Jacquard_loom]], and maybe if you squinted a bit and jumped through many flaming hoops you might see it was getting close to Turing-capable, and if you examined manuals a lot and were cunning you could get it to do something kinda like a proper algorithm. Certainly when I've looked at the functionality of some of the 1940s and 50s computers, that's what they always looked like.\n\nNo, that's not the Analytical Engine at all. It's not a bag of bolts that some mathematician can show in 200 pages of jargon can be made to be Turing-complete. It's much better than that - it's basically a 6502 with bells on (literally).\n\n[[Here are lots of details|http://www.fourmilab.ch/babbage/cards.html]] (including an emulator!), but basically it has a fairly straightforwards machine language - it does addition, subtraction, multiplication and division. It has two input registers and an output register. You can do loads from a store (memory) with 1000 entries, to the input registers, and when you load the second register, it does the operation you ask, and puts the result in the output register, which you can move to the store. You can also do shifts left and right to move the decimal point around in fixed-point maths. If the result of a computation has a different sign to the first input, or overflows, it makes a "run-up lever" move upwards. One could imagine tying a small bit of cloth to this lever, and then one might term this "raising a flag". You can issue commands that say to move backwards or forwards by a set number of commands, and you can tell it to do this all the time, or only if the run-up level is raised. Hey - unconditional and conditional branches.\n\nI mean that's it - it's so absurdly simple. It has load, store, the basic six arithmetic operations, and conditional branches. Right there in front of you. It's a processor. You could code on it. Look, here's some code (with my comments):\n\n{{{\nN0 6 ;; preload memory location 0 with the number 6\nN1 1 ;; preload memory location 1 with the number 1\nN2 1 ;; preload memory location 2 with the number 1\nx ;; set the operation to multiply\nL1 ;; load memory location 1 into the ALU\nL0 ;; load memory location 0 into the ALU - second load causes the multiply operation to happen\nS1 ;; store the result in location 1.\n- ;; set the operation to subtract\nL0 ;; load location 0\nL2 ;; load location 2 - causes subtraction to happen.\nS0 ;; store to location 0\nL2 ;; load location 2\nL0 ;; load location 0 - causes subtraction to happen.\n ;; If the result sign is different to the first argument, the run-up-lever is raised\nCB?11 ;; if the run-up-lever is raised, move back 11 instructions.\n ;; Like today's CPUs, the location of the "instruction pointer" has already moved past this instruction,\n ;; so back 11 means the next instruction is the "x" above.\n}}}\n\nThe result of this function is in location 1. Notice that location 2 never gets stored to - it's the constant value 1. Still having a bit of trouble - let me translate it to C:\n\n{{{\nint mystery_function ( int loc0 )\n{\n int loc1 = 1;\n const int loc2 = 1;\nkeep_looping:\n loc1 = loc1 * loc0;\n loc0 = loc0 - loc2;\n if ( sgn(loc2 - loc0) != sgn(loc2) )\n {\n goto keep_looping;\n }\n return loc1;\n}\n}}}\n\n...and now change "loc2" to be "1" and massage the "if" conditional appropriately:\n\n{{{\nint mystery_function ( int loc0 )\n{\n int loc1 = 1;\nkeep_looping:\n loc1 *= loc0;\n --loc0;\n if ( loc0 > 1 )\n {\n goto keep_looping;\n }\n return loc1;\n}\n}}}\n\nYou can't still be having trouble. It's a factorial function! Apart from the slight wackiness of the loop counter, which is an idiom just like any other language, it's all absurdly simple and powerful. The only big thing he was missing from the ALU was direct boolean logic for doing complex control logic. But it's not a huge lack - first, it can all be emulated without too much pain with decimal logic (OR is an add, AND is a multiply, etc), and secondly after about five minutes playing with the thing he would have realised that he needed something like that, and the mechanicals for boolean logic are trivial compared to decimal multiplication and division.\n\nJust to gild the lilly, Babbage wanted to attach a plotter to it. You could send the most recent computed result to the X-coordinate or Y-coordinate of the plotter, and you could raise or lower the pen. He could have coded up Tron light-cycles on the damn thing.\n\nThe one thing he missed (and to be fair so did half the computer pioneers of the 1940s) is the ability to compute addresses, i.e. instead of always looking up the data in location N where N is punched into the card itself, the ability to calculate N with maths. However, he did think of precomputed lookup-tables, which is one form of this, but only loading from it, not storing to it. Amusingly, he then decided that actually, since the table - e.g. of logarithms - had been generated by the AE in an earlier run, and that doing the lookup involved the AE displaying a number and then ringing a bell and getting a man to go rummage through a stack of punched-cards for the right one, which would take time and be error-prone (he also had a primitive error-checking system!), it might just be better if the machine recomputed the desired value there and then, rather than using a LUT at all. Which is almost exactly where we are in computing now, where LUTs are frequently slower and less accurate than just re-doing the computation!\n\nAnd he did all of this in a total vacuum. He had nothing but paper and pen and brass and plans and his brains, and he invented a frighteningly useful, powerful and easy-to-program processor. Including comments and debugging support. They didn't even have a machine as powerful as a cash-register at the time - nothing! So when I say Babbage was a genius - he's one of the greats. If he'd succeeded even slightly, if the Difference Engine had got half-finished and produced useful results and people had taken him seriously - all that Turing and von Neumann stuff that we think of as the birth of modern computing - obsolete before they even thought of it, because Babbage got there nearly a century before.\n\nPut it like this - if he had been born almost exactly 100 years //later// and instead of brass and steam had worked in relays and electrickery in the 1950s, he would have made almost exactly the same breakthroughs and probably built almost exactly the same machine - it just would have been a bit faster (but less reliable - bloody valves!). It's almost shameful that in those 100 years, nobody advanced the art of computing in the slightest. And yet think of all the things that have happened in the 50 years since then. I read Sterling and Gibson's book "The Difference Engine" ages ago, and thought it was all a bit of a flight of fancy - pushing the probability curve to breaking-point - the way SF authors should. Now - I'm not sure it was at all fanciful. Imagine if we were living in a world that had had computers for three times as long. Gordon Moore's famous law would have gone totally ballistic by now. Microsoft's motto would have been "a computer in every can of coke", because we'd have had a computer on every desk since about 1910, and the big thing that helped the Allies beat the Nazis wouldn't have been radar, it would have been the Web.\n\nThere's just one bizarre thing I can't figure out. Babbage initially specified the Analytical Engine to 40 //decimal// digits. Later, he upped it to 50. 50 decimal digits is about 166 bits. That's gigantic, even for fixed-point arithmetic. These days we find 32 bits just fine (9-and-a-bit decimal digits) - 64 in some specialised places (19 digits). But 166 bits is nuts - that is collossal over-engineering. And it's not because he didn't understand the maths - it's extremely clear from his writings that he understood fixed-point math perfectly well - shifting stuff up and down after multiplication and division, etc. In two separate places he explained how to do arbitrary-precision arithmetic using the "run-up lever" (i.e. the AE version of "add-with-carry") in case people needed 100 or 200 digit numbers. To put this in perspective, the universe is at least 156 billion light-years wide - that's 1.5 x 10^27 meters. A single proton is 10^-15 meters across. So the universe is roughly 1.5 x 10^42 protons in size - which only takes 43 decimal digits to represent. Babbage decided that wasn't enough - he needed the full 50 digits. Also, he specified a memory of 1000 entries of these 50-digit numbers, and that didn't include the code, just data - that's 20kbytes of random-access data. For a smart guy whose major problems were finding good enough large-scale manufacturing technologies and finding the cash to build his gigantic room-sized engine, that seems pretty dumb. If he'd built a 3-digit AE, his instruction set would have exceeded the computing capabilities of every 8-bit machine in the home microcomputer revoloution of the 1980s. 5 digits and he'd have beaten the 16-bit PDP11, ST, Amiga and 8086 for per-instruction computing power. If he'd only aimed a little lower, he could almost have built the thing in his own workshop (his son constructed a three-digit Difference Engine from parts found in his house after his death!), instead of spending half his time arguing with his fabrication engineers and begging parliament and the Royal Society for ever-increasing sums of money. What a waste. But still - a genius of the finest calibre.\n\nMy theory is, Babbage was actually given a Speak-and-Spell by Dr. Who. Or maybe Sarah Connors. It's the only rational explanation.
I heard an eye-opening rant by Chris Hecker the other day.\n\nNo, not //that// rant - I already knew the Wii was one and a half GameCubes stuck together with duct tape. No, the one about bitangents.\n\n"The what?" I hear you ask. "You mean binormals?" As it turns out - no. The rule about what "bi" means is that some objects have an arbitrary set of properties, and you need two of them. It doesn't matter which two - you just need them to be two different examples. So a curve in 3D space has not just one normal, but loads of normals - a whole plane of them in fact. But you only need two of them - pretty much any two. You call one the "normal" and you call the other the "binormal". The standard classical-language hijack of "biX" just means "two X" or "the second X". And that's where binormal comes from - curves. Note that a curve only has a single tangent vector (well, OK, it has two, but one is the negative of the other).\n\nOK, so now on surfaces, you have a single normal. There really is only one normal. But there's loads of tangents - an entire plane of them - the plane of the surface. And so you need to pick two of them (typically if the surface is textured we're concerned with the lines of constant V and the lines of constant U). So one is called the tangent, and logically the other one should be called the "bitangent". Yes, you heard me - not "binormal".\n\nAnd that's the rant. When doing bumpmapping on a surface, you should talk about the normal, tangent and //bitangent//. Nobody's quite sure why people cottoned on to the word "binormal", and I've certainly spent the last ten years talking about binormals, but you know what - it's still wrong. From now on, I will speak only of bitangents (though that's going to cause chaos with Granny3D's nice backwards-compatible file format).\n\nHere endeth the lesson.
The game of the film. Except it had almost nothing to do with the film because we were making it at the same time as the film, didn't know much about their plot, so we just used the same characters in a different story. It had some neat features. The 360-degree fighting was kinda nifty - it used dual-stick "Robotron melee" controls - left one moved Blade, right one would attack in the direction pushed, and the exact attacks used were all down to timing and which weapon you had out.\n\nIt came out on Xbox and ~PS2, and I wrote the Xbox rendering engine and all the other platform-specific stuff (sound, controllers, save/load, etc). From the start we decided the two platforms were too different to share much code, and we were mostly proved right, though in the end we did share more than we thought - mainly the streaming code and some of the lighting stuff. Originally the ~PS2 wasn't going to try to stream data, because we assumed the DVD seek times would kill us, but it worked much better than we thought, and the extra memory really helped. I'm fairly proud of the graphics engine - got a lot of good tech in it such as streaming almost all rendering resources, some neat code to deal with multiple dynamic light sources, compile-on-demand shaders, and some really good VIPM stuff. Other Xbox-only features were some nice self-shadowing with my first experiments with shadowbuffers, a cool-looking cloth simulation for Blade's cloak.\n\nIncidentally, the lighting was a huge pain. You see, Blade is this black guy who wears black leather and black shades, and he hunts vampires, so we're not going to have any bright sunlit levels - we're pretty much always in sewers at night. Result - he's basically invisible. It's fine to show him as a lurking shadow for a film, but in a game you kinda have to be able to see what's going on. Pretty much the only reason you could see him at all was because we cranked up the specular highlights like crazy, and made everything else in the scene fairly bright colours, so at least he's silhouetted.\n\nDespite being happy with it on a technical level, it was still a rather rushed game, and had some pretty rough edges. Some of the levels were rather dull - it would have been better if it had been a shorter game, but back then it wasn't acceptable to ship anything less than 20 hours of play, even if a lot of that play was a bit dull. C'est la vie.
If you're going to add this blog to your blogroll, and thanks very much for doing so, please use http://www.eelpi.gotdns.org/blog.wiki.html, and not the address you actually see in your browser - that one will change if/when I move ~ISPs. Thanks.
Pascal Sahuc emailed me a link to a neat paper on cloud simulation with a cheap CA. [["A Simple, Efficient Method for Realistic Animation of Clouds"|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.4850]] by Dobashi and Kaneda. In fact it's absurdly cheap - each cell has only three bits, and it's all boolean tests. I'm amazed that such wonderful results come out of such simple rules and three bits of state. They do smooth the field before rendering, but it still [[looks really good|http://video.google.com/videoplay?docid=3133893853954925559]].
Another article got converted to HTML, and some hindsights. I wish I'd found a fun game to put the code in - it seemed to work really well, but I couldn't think of anything really cool to do with it. [[Papers and articles section|http://www.eelpi.gotdns.org/papers/papers.html]]
[[HelloThere]] [[GDC 09]] [[How to walk better]] [[Larrabee ISA unveiled at GDC 2009]] [[CAs in cloud formation]] [[Regular mesh vertex cache ordering]] [[Siggraph Asia 2008]] [[Plague]] [[Siggraph 2008]] [[ShaderX2 available for free]] [[Larrabee and raytracing]] [[Renderstate change costs]] [[Larrabee decloak]] [[All blog entries]]
A way of compressing vertex data using the principles of displacement mapping, but without needing complex hardware support.\n\nThe idea is to encode the vertex as three indices to other vertices, plus two barycentric coordinates, plus a displacement along the interpolated normal. This gives you a vertex that is about 8 bytes in size, which is far smaller than most representations, reducing memory footprint and bandwidth requirements. This stream can be easily decoded by ~VS1.1 shader hardware, which is now ubiquitous.\n\nFor more details see:\n* Dean Calver's article in ~ShaderX2 "Using Vertex Shaders For Geometry Compression" [[(available for free now)|http://www.realtimerendering.com/blog/shaderx2-books-available-for-free-download/]]\n* [[My GDC2003 paper on Displacement Mapping|http://www.eelpi.gotdns.org/papers/papers.html]]
If you're going to release a demo, try and minimally playtest it first on some people who (a) have not already played your game and (b) have $50 they can spend on anything they like. If for some bizarre reason they fail to immediately hand over the $50 in exchange for a full copy of the game, you might want to think about tweaking your demo.\n\nHere's the checklist. It's not exactly rocket-science, but it's truly astounding how many manage to violate not one point on it, but every single one. It's even more astounding how random downloads from thid- and fourth-tier publishers on Fileplanet or whereever manage to consistently score several bazillion points above the first-tier publishers parading their stupidity on the heavily-controlled and scruitinised Xbox Live. Maybe that's because those lowlier publishers have to work for their money - who knows?\n\n(1) To paraphrase Miyamoto - a late demo can eventually be good, while a rushed demo is bad forever. To put it another way - the demo isn't for the fanbois who read the gaming press and would buy a turd-in-a-box from comapny X. The demo is for the non-hardcore looking for something new. If you release the demo early and it sucks, you will turn away everyone, including the fanbois. If you ship it after the game goes to master and make sure it's not horrible, it will have the most glaring bugs fixed, be more balanced, and you might pull in some longer-term interest and keep yourself from being hurled into the bargain-bin two weeks after release.\n\n(2) If at all possible, include the tutorial. I know you think your precious snowflake is so intuitive it's just pick-up-and-play. But again remember - your demo is NOT for the fanboy who bought the first eight episodes of your game series. It's to find the people who don't have preorders. They may not know how to play your game. They may have never played a game like yours. They may have never played any game. That is probably why they have a surplus of $50 bills. One of them could be yours.\n\n(3) If you really can't be bothered with a tutorial, at least have some help screens that tell people what the buttons on the controller do. Maybe even what the strange bars on the screen do, or what the icons on the mini-map do, and what to do about each of them. Especially when they flash - flashing says "I'm important". Not explaing what important stuff is shows disdain for the player. And don't make those screens be the loading screens because guess what - they vanish while the player is reading them. Nothing shows a new player the middle finger quite so directly as a screen full of important text that you then don't let them read properly. They haven't even pressed a button and already they hate you!\n\n(4) Make the demo nice and easy. Demos are not there to present a challenge to the player, they're there to demonstrate the full set of features to them. So for example in a certain fighting game renowed for its boob physics, you might want to slow everything down a bit and make the computer opponents not quite so ... well ... ninja. Then the player will actually be able to experiment with all the cool throws you spent so long doing the game code and animation for, instead of being mid-air juggled for 75% of the health bar the instant they do anything but madly bash the uppercut button.\n\n(5) What is your Unique Selling Point? Why is your game teh ace and all others teh suk? After all, that is why you spent 500 man-years on this product, and that is why your publisher is beating you over the head about deadlines. I may be talking crazy here, but I'm thinking you may wish to explain it to the player. If you don't, the player will be looking at your product and thinking deep philosophical questions such as "why would I spend $50 on this?" You do not want them thinking those sort of questions - you want them thinking "wow - I'm glad I spent $50 on this brilliant game." If you can't manage that, at least go for "shame I spent $50 on this game - the USP sounded cool but it wasn't that much fun after level 8". Your bank manger is equally happy with either.\n\nThis is your first contact with a large chunk of your potential audience. They have $50, and they are deciding whether to give it to you, so you can feed your wife/husband/kids/cat/llama/habit. Marketing has done its job, and the punter is finally devoting their full and undivided attention to your game and playing your demo. Now it's all up to you - this is your fifteen minutes of fame. If that person doesn't like your demo, it doesn't matter how hard you crunched to get the game out the door.
With the usual anti-bot obfuscations, my address is: tom ~~dot~~ forsyth ~~at~~ eelpi ~~dot~~ gotdns ~~dot~~ org.
Bonus karma points to Ben Garney of Garage Games who spotted a silly math error of mine in [[Knowing which mipmap levels are needed]]. Now corrected.
Lots of fun. Slightly quieter than last year, but according to many it was because companies are only sending the key people, rather than the massed hordes. Also seemed like a lot fewer of the random mousemat manufacturers, DVD duplicators and the like.\n\nMichael Abrash and I gave our talks. He almost filled his gigantic 1000-person room, and my little 300-person room was totally full - not even standing room available. I made sure I left 10 minutes at the end of my talk for questions, but we ended up with half an hour of questions and they had to kick us out of the room for the next session. So I call that a success.\n\nA general site for all your Larrabee needs is: http://www.intel.com/software/larrabee/\n\nOur slides are available here (scroll down near the bottom): http://intel.com/software/gdc\n\nMichael Abrash's article in Dr. Dobbs Journal is now live: http://www.ddj.com/architect/216402188\n\nThe "C++ Larrabee Prototype Primitives" available from here: http://software.intel.com/en-us/articles/prototype-primitives-guide/ These allow you to write Larrabee intrinsics code, but then compile and run it on pretty much anything just by adding a #include into the file. The intent was to make it as easy as possible to prototype some code in any existing project without changing the build.\n\nI went to see [[Rune|http://runevision.com/blog/]]'s talk on locomotion (see [[How to walk better]]), and it was really superb. Great tech, great (interactive!) slides, and he's a really good presenter. If you've read his stuff before, go see it again as there's a bunch of stuff I hadn't seen before - tricks like how to stop the ankle flexing bizarrely as people step down from high places.
I was asked recently what the best way to approach [[http://www.gdconf.com/|GDC]] was for the newbie. GDC is a big busy place, and it's easy for people to get a bit lost. Here's a few handy hints:\n\n* Take an hour beforehand to sit down and go through the entire timetable. In each timeslot, mark the top three things you'd like to see, and give them a score from "might be interesting" to "must see". Remember to at least look at the tracks that aren't your main discipline - they sometimes have interesting or crossover stuff. Write these scores down, and when you actually get to GDC, transfer them onto the at-a-glance schedule thingie you get in the welcome pack, then you can refer to it quickly during the day.\n\n* Prioritise sessions you don't know much about but sound interesting over things that are right in the middle of your primary field. For example, I'm a graphics programmer who's already worked a lot with DX10, so there's probably not a great deal of point going to a talk called "D3D10 Unleashed: New Features and Effects" - it's not going to teach me much new stuff. But I've not all that done much general-purpose multi-processor work, so the talk called "Dragged Kicking and Screaming: Source Multicore" sounds fascinating and will probably give me some new insight when I do start doing multi-processor work.\n\n* Be prepared to miss a lot of the lectures - GDC is pretty chaotic. Allow it to be chaotic - don't try to control it too much. If you're talking to someone interesting, and they're not rushing off to something, keep talking to them. You can always go into the lecture for the second half, or at worst you can pick up the lecture notes later. But you might never meet that interesting person again.\n\n* Don't spend too long on familiar stuff. GDC's meant to be about new stuff, so don't just hang out at a bar and gossip with your friends all day. Hang out at a bar and gossip with //new// people instead.\n\n* Don't spend too long in the expo. Have a quick walk around fairly early on and check the place out, see what might be interesting, but don't hang around or play with stuff. There's two halls this year, and with E3 gone it's even bigger and probably noisier than previous GDCs. If you do find something interesting, note it down for later so when you do find yourself with a spare half hour, or during lunch, you can go back and have a better look. Similarly, if the expo is crowded, go do something else. It's probably lunch-time or something. Come back in an hour and it will be quieter. The booths and the people on them will still be there, and you'll actually be able to see something and talk without shouting. Also don't do the expo on the first day or the last day - that's what everyone else does.\n\n* Remember that there's two expo halls this year.\n\n* If after five minutes of a lecture you find the lecturer is dull, or presents badly, or is just reading off the slides, leave. It's obvious you're going to get just as much from reading the slides later as you will from seeing the talk in person, and it'll be quicker. Your time is valuable - go to the next-best-rated lecture on your list. Some lectures are 10x better in person - those are the ones you should be watching. If you feel too embarrassed to just walk out, take your phone out as if you had it on vibrate and just got a call or a text message, and walk out glaring at it.\n\n* During the lectures, have a notepad and pen out. When you have a question, scribble it down. At the end of the lecture, the presenter will ask for questions, and there's always a deafening silence for a few minutes while everyone remembers what their questions were. That's when you can stick your hand up or go to the microphone and ask your question.\n\n* Fill in those feedback forms. The GDC committee do collate them and do take them pretty seriously when considering who to invite back to speak next year. The authors do also get the feedback and some of the comments, so if they have good info but are a terrible speaker, say so. And use the whole scale. I forget what it's out of, but if it's 1-10, 1 = bad, 5 = good, 10 = excellent. No point in messing around with 6, 7, 8 as scores - your voice will be lost in the statistical noise.\n\n* Everyone always asks me about the parties and which ones to go to. I'm not a party animal, so I don't enjoy them for what they are. And as far as socialising, they're terrible. Talking is almost impossible and unless you're very wealthy or persistent, you'll be hungry and thirsty. Much better to hang out in a hotel bar or lobby with some friends. You'll still see plenty of game geeks doing the same, but you'll be able to talk and eat. And it's far more relaxing, which means you can gossip til the early hours and still be reasonably awake for the first lecture next day.\n\n* Don't worry if you think you're missing all the cool stuff. Everybody always misses all the cool stuff. This year they're in SF rather than San Jose, so nobody knows where anything is, so even old pros will be confused.\n\n* Go to the Indie Game Festival if you can. Definitely visit the IGF pavillion to check out the games and chat to the developers. The standard is awesomely high every year.\n\n* Play the meatspace MMO game - this year it's "Gangs of GDC". Doesn't usually take up a significant amount of time. It's a good excuse to be silly and talk to people.\n\n* The Programmers Challenge is back! Always amusing for the uber-geek.\n\n* The Developer's Choice Awards are worthwhile and not as bogus as most of the other awards. Go to them and cheer for people that don't suck.
''WORK IN PROGRESS'' - which is why it's not on the main page yet.\n\nA topic guaranteed to cause an argument amongst the shipping veterans is where you should do editing of levels, scripts, shaders, etc. There's three obvious options, although different aspects may use different ones, e.g. shader editing may be done in one place while level editing done in another:\n\n-A plugin for your DCC tool of choice (Max, Maya, XSI, etc)\n-A standalone editor.\n-A part of your game.\n\nThe second two are obviously close - the difference being whether you have to do a save/convert/reload cycle every time you want to test something in the actual game. For some console developers, this is a more obvious distinction as it means they don't have to make a PC version of their game simply to test & edit it. To simplify the argument, I'm going to lump the second two together and say there's two choices - embedded in your DCC tool, or as a fully-custom editor that shares as much as possible with your game engine.\n\nThe advantages of making your game be the editor are obvious. You get to see things as they really are, almost instantly. You're using the real rendering engine, you're using the real asset-management system, and you're using the real physics engine. If some cunning design or model won't work inside your engine, you'll know about it immediately, not three months later when you've already authored a bunch of content. Your artists and designers can experiment however they like, because they see the results immediately and know if they look good or play well. The biggest advantage, if you make it right, is that level designers can set up objects and scripts, hit a button, and they're instantly playing the game. Keeping that design/test/tweak iteration cycle as fast and tight as possible is the key to excellent gameplay. The same is true of lighting (whether realtime or lightmap-driven) and sound environments - both of which rely on having large chunks of the environment around to see whether they work or not, and are almost impossible to do in small unconnected pieces, or without very fast previews.\n\nThe disadvantages are that it's more complex to make an engine that can run in two different modes - editing, and playing. The editing mode will be running on a powerful PC, often with unoptimised assets (optimising takes time and leads to longer turnaround cycles - you don't want that).\n\nThe argument for putting the editing tools inside the DCC tool are obvious - your artists already know how to use the tool, and it's a bunch of editing & rendering code that's already been written. Why reinvent the wheel? Ordinarily I'd agree - wheel reinvention is a plague, and burns far too much programmer time in our industry. But hang on a bit - is that really true?\n\n\n\n\n\nFundamentally, there's a few big problems we have to deal with:\n*Previewing assets in the actual game environment. That means shaders, textures, lighting models, shadows, lightmaps. If the artists can't see all that as a fast preview while they're building the model, they're going to be doing a lot of guesswork.\n*Level design. Placeholder artwork, hacked-in control schemes, pre-scripted actor movement - doesn't matter - you need to give the designers tools to work with so they can start getting the broad-strokes design stuff up and running so that everyone can get a feel for what the game is going to be about.\n*Sound & animation design. Seems odd to put these two together? Not really. Both have a time component to them, and they need smooth blending. This makes them fundamentally different things to rendering and lighting. But you still need to be able to easily preview them in the real game. Also, animations frequently have sound triggers on them - footsteps, equipment sounds, etc. For bonus annoyance, typically the animator and the sound guy aren't the same person, don't know how to use each other's tools, and will want to edit the files independently.\n*Keep iteration times down. And by "down" I mean less than 15 seconds from tweaking a model/texture/light/object placement/trigger/script to trying it out in the game. We all know that iteration and evolution is at the heart of a good game, so we need to keep the dead time in that iteration loop to the lowest possible.\n\nMy preferences:\n\nMake your engine able to reload assets at the press of a button. Having to shut it down and start it up wastes time, especially as those are the areas you tend to optimise last in a game. You need your artists to be able to edit something in the DCC tool, hit a button, and have the results visible right then. This requires a decent amount of infrastructure, though a lot of that is shared with things like streaming systems (which are really good things to have), and stuff like dealing with Alt+Tab on ~PCs. Even if all you can do is reload existing assets, that's a big time-saver. If you can further and import new models with new texture names, etc. that's even better.\n\nMake sure your engine can read assets directly from the exported data, or do so with minimal and fully-automated processing. It doesn't have to render them quickly, and you obviously need another optimised format that you will actually ship data in, but you also need something with a very quick turnaround time. It doesn't matter if the model the artists is currently editing is loading all its textures from raw ~TGAs - we all have big video cards these days. Once they're happy with the textures, a batch script or tool can compress them down to your preferred format. This can either be done overnight, or on another preview button so the artist can see what the compressed version actually looks like.\n\nDitto with things like vertex cache reordering and optimal animation-bone partitioning - none of that stuff matters while the artist is actually editing the model - do it in a batch process overnight. One of the really nice things about Granny3D that we've focussed on a fair bit over the years is the forward and backward compatibility of the file format - you can store all sorts of data in a .~GR2 file, both raw and compressed, and they should all be loadable by anything at any time.\n\nOne subtlety here is that a processor should never modify original data. The stuff that comes of out the DCC tool should be minimally processed during the export process - convert it to a standard format, but leave as much data in there as you can. Later processing steps can strip that data out, but they will do so with a different file as a target, not an in-place modification. That way when the optimised format changes, you don't need to fire up the DCC tool and do a while bunch of re-exporting - you can just run a batch file on all the raw exported data and have it in the new and improved format.\n\nThe level & lighting editor should be the game. Using Max/Maya/XSI/Lightwave as a level editor never works well. I've tried it both ways, and I've discussed this with others. Think about it - all you're getting out of your DCC tool is a camera movement model (easily replicated), a bad renderer (you already have a good renderer), some raycast mouse-hit checking (every game has these already) and a bad button UI that I guarantee you will annoy the snot out of you when you actually go to use it. Trying to use the DCC tool's interface to do any serious previewing is horrible - just think about trying to implement some sort of shadowbuffer rendering, or implementing the movement logic of the character inside inside a DCC tool - it's pretty absurd. This is not reinventing the wheel - there's just so little of a DCC tool you actually need or want in something of the scope of a level editor.\n\nThe animation & sound editor should absolutely be the game. There's just no doubt here - DCC tools don't help at all. Because you need to be able to see the blending and transitions between different animations, and the 3D sound environment as it really will be, a DCC tool doesn't bring anything to the table. The way I've seen animation editors done really well in the past is that the game is playable (ideally with control over any character) and the last thirty seconds are continuously recorded. At any time the animator/sound guy can pause and scrub through that time window, tweaking blends and when sound events happen.\n\nSo that's my rant - the smartest coders should be tools coders, and they need to write level editors and makefile-driven processing chains. DCC tools are for Digital Content Creation on a micro scale - individual characters, individual animation sets. They're not for entire levels, and they're really bad at previewing.
This is pretty handy - [[GameMiddleware|http://www.gamemiddleware.org/middleware/]] - a website that is just a big list of games middleware. I'm so tired of people reinventing perfectly good wheels. Unless your game is going to focus on having the best system X in the world, why not go and look if someone's already got a perfectly good X you can just buy? Then you can spend your time solving some more interesting problems.\n\nAlso tinkered with the titles. Not much point having the titles just be the date when the date's written just below them. And the RSS looks more sensible that way.
Granny3D is a big bag of goodness from [[RadGameTools|http://www.radgametools.com]]. It's an exporter of all sorts of data from Max, Maya and XSI, it's an asset pipeline manipulation and conversion tool, it's a normal-map generator, it's a really cool file system that copes with endianness, 32 or 64 bit pointers, and versioning, it ships on 11 different platforms, and above all it's the world's most superlative runtime animation system. We kick arse and don't bother taking names.\n\nI say "we", because it was written by [[Casey Muratori|http://www.mollyrocket.com/]] originally, handed over to me for two exciting years, and then taken over by [[Dave Moore|http://onepartcode.com/]]. But as I work in the office next to Dave, I'm not so much the spiritual advisor as the ghostly haunting. I'm sure Dave dreads me coming in and starting sentences with "oh yeah and another thing I always meant to implement..."\n\nBut now that I'm on the SuperSecretProject, I'm officially off Granny3D. Not that it ever stops me pimping it like crazy to anyone within earshot. Seriously - it's a bunch of really cool tech, a bunch of rather tedious tech, a bunch of gnarly problems solved by the brightest minds in the business (and me), and it's all wrapped up into a lovely little bundle that if you were to replicate yourself would make you officially insane. Pay the money and get on with something more interesting! When I think of all the man-years it would have saved me when working at MuckyFoot, ooh it makes me mad. As Dave said once "we can't replace your existing animation coder, but we can paint a [[big red S|http://en.wikipedia.org/wiki/Superman]] on his chest".\n\nAs time goes on, Granny becomes more and more about the entire toolchain - taking your mesh, texture, tangent space, animation and annotation data from the various DCC tools, exporting it in a reliable robust cross-platform format, processing that data in a large variety of ways, and then finally delivering the data in optimised form to your game engine. This is an under-appreciated area of games programming. It's not viewed as a sexy job for coders, and it tends to be done hastily, often by junior programmers. As a result, it is usually a source of much pain, not least by the artists and designers who have to use it. Don't let this happen to you - start your toolchain on a solid foundation.
Hello and welcome to my blog. Always fresh, always technobabble. Read the chronological blog that is usually listed below, or use some of the links to the left. Or EmailMe.
A while back I made a "How to Walk" demo for Granny3D (slides available here: [[http://www.eelpi.gotdns.org/papers/papers.html|http://www.eelpi.gotdns.org/papers/papers.html]]) that showed how to do interpolation of walking stances, how to cope with uneven terrain and do foot IK. It's trickier than it looks. I always meant to keep updating that demo with more features - most notably missing was how to stop! I'm sure many of you remember it from GDC as "the one with Granny in a Xena: Warrior Princess outfit". It's certainly... eye catching (thanks to [[Steve Theodore|http://www.bungie.net/Inside/MeetTheTeam.aspx?person=stevet]] for the memorable artwork). Anyway, I got distracted by Larrabee, and never did add those features, which I always felt bad about.\n\n[[Rune Skovbo Johansen|http://runevision.com/blog/]] has done a similar system, but taken it a lot further. He uses fewer animation cycles and copes with more terrain. Cool features include 8-way motion, ledge avoidance, stop/start, leaning into slopes, four-legged animals, and foot placement on stationary rotation. It's a very comprehensive locomotion system, and I'm going to see if I can get to [[his GDC talk|https://www.cmpevents.com/GD09/a.asp?option=C&V=11&SessID=9004]] on it.
I've converted my old article on impostors to HTML so it's "live" and searchable and so on. Also added a few hindsights. The article was written before [[StarTopia]] was published, so there's a few implementation notes added about what did and didn't work in practice. It's in the [[papers and articles section|http://www.eelpi.gotdns.org/papers/papers.html]]
People have been bugging me for ages, so I guess it's time to start a blog. I'll try not to ramble too much.\n\nFirst up - [[The Elder Scrolls IV: Oblivion|http://www.elderscrolls.com/]] - what a totally fantastic game. I'm playing it on the Xbox 360 and it's gorgeous. Yes, yes, it's got gameplay up the wazoo as well (that's why I'm 30 hours in and barely feel like I've started), but I'm a graphics tech whore, so I reserve the right to spooge over the pixels. It's amazingly immersive - they've worked out that immersion is defined not by your best bit, but by your worst, and they've spent a lot of time filing off all the rough edges. It's one of the few games where the screenshots on the website are what it looks like //all the time//. Other games, you walk around and there's texture seams all over the place, and when you get close to some things it's really obvious they're just a big flat billboard, or whatever. Not in this. I spent quarter of an hour yesterday just walking around looking at trees and plants and fallen logs and lakes and little stone bridges and the way the sunlight went through the leaves - totally forgot about playing the game until a mud crab almost tore off my leg.\n\nI probably need to get out more into the real world - I hear they have some pretty neato trees and stuff as well, not to mention the world's most powerful photon-mapper. And fewer mud crabs.
There's a fundamental signal-processing concept called various different names, mostly with the word [[Nyquist]] in them, which (if you wave your hands a lot and ignore the screams of people who actually know what they're talking about) says that an object that is 32 pixels high on screen only needs to use a 32x32 texture when you're rendering it. If you give it a 64x64 texture, not only will the mipmapping hardware ignore that extra data and use the 32x32 anyway, but //it's doing the right thing// - if you disable the mipmapping and force it to use the larger version, you will get sparkling and ugliness.\n\nObviously it's more complex than that, and I'll get into the details, but the important point is - the size of texture you need for an object is directly proportional to the object's size on-screen. And that's what texturing hardware does - it picks appropriate mipmap levels to ensure that this is the size its actually using. Every graphics programmer should be nodding their head and saying "well of course" at this point. But there's another thing that is subtly different, but important to realise. And that is that so far I haven't said how big my texture is, i.e. what size the top mipmap level is - because it doesn't matter. Whether you give the graphics card a 2048x2048 or a 64x64, it's always just going to use the 32x32 version - the only difference is how much memory you're wasting on data that never gets used.\n\nIf you are streaming your textures off disk, or creating them on-demand (e.g. procedural terrain systems), this can be incredibly useful to know. If you don't need to spend cycles and memory loading big mipmap levels for objects in the distance, everything is going to load much quicker and take less memory, which means you can use the extra space and time to increase your scene complexity. Obviously my approximation above is very coarse - it's not just size in pixels that matter, because we use texture coordinates to map textures onto objects. So how do you actually find the largest mipmap you need?\n\nAs a thought experiment (this is going to start out absurd - but bear with me), take each triangle, and calculate how many texels the artist mapped to that triangle for the top mipamp level. Then calculate how many pixels that triangle occupies on the screen. If the number of texels is more than the number of pixels, then by the above rule you don't need the top mipmap level - throw it away, and divide the number of texels by 4. Keep doing that until the number of texels in the new top mipmap level is less than the number of screen pixels. (keen readers will spot that trilinear filtering means you actually need one more mipmap level than this - to keep things simple, I'm going to ignore that for now, and we can add one at the end). Do this for all the triangles that use a certain texture, and throw away all the mipmap levels that none of the triangles need.\n\nThat's the concept. Obviously far too expensive to do in practice. The first step is to get rid of the loop and say that the number of mipmap levels you can throw away = 0.5*log2 (top_mipmap_texel_area / screen_area). Quick sanity check - 128x128 texture has area 16384 texels. If you draw it to a 32x32 pixel square on screen, that's 1024 pixels, so you need to throw away 0.5*log2(16384/1024) = 0.5*log2(16) = 2 mipmap levels, which is correct. Why multiplying by half? Because we actually don't want log2(x), we want log4(x), because the number of texels drops by 4x each mipmap level, not 2x. But log4(x) = 0.5*log2(x).\n\nObviously, you can precalculate the value "top_mipmap_texel_area" - once a mesh is texture-mapped, it's a constant value for each triangle. But calculating the screen-area of each triangle each frame is expensive, so we want to approximate. If we pick an area approximation that is too high, then the actual value of screen_area will be higher than the one for the current frame, and we'll throw away fewer mipmap levels than we could have done. This isn't actually that bad - yes, we waste a bit of memory, but the texturing hardware still does the right thing, and will produce the right image. So approximating too high doesn't change image quality at all. Given that, the approximation we make is to assume that every triangle is always parallel to the screen plane - that its normal is pointing directly at the viewer. This is the largest the triangle can possibly be in screen pixels. In practice it will be at an angle and take fewer screen pixels.\n\nSo how large is this maximum size? Well, if we assume mesh deformation and skinning don't do crazy things and ignore them, a triangle always stays the same size in world units (e.g. meters). At distance D from the viewer, with a horizontal screen field of view of F radians, and a horizontal pixel size of P, the screen length of m world meters is (m/D)*(P/tan(F)) pixels in length. That's world length to screen length, but we want world area to screen area, so we square the result. But if we feed in world area of the triangle, M=m^2, then we get the screen area is (M/(D^2))*((P/tan(F))^2) = M*((P/(D*tan(F)))^2). As a quick sanity-check: double the distance away = quarter the screen pixels. Double the screen resolution = four times the pixels. Widen the FOV = smaller screen area.\n\n(P/tan(F))^2 gets calculated once a frame, so that's a trivial cost. And we'll make another approximation that the distance D is not done at every triangle, it's measured from the closest point of the bounding volume of the object. Again, we're being conservative - we're assuming the triangles are closer (i.e. larger) than they really are. So for a single mesh, we can calculate ((P/(D*tan(F)))^2) just once, and then use it for all the triangles. For brevity, I'll call this factor Q, so for each triangle, screen_area <= (world_area * Q).\n\nSo looking back at the mipmap calculation, we calculate 0.5*log2 (top_mipmap_texel_area / screen_area ) for each triangle, take the minimum of all those, and that's how many mipmap levels we can actually throw away.\n\n.... min_over_mesh (0.5*log2 (top_mipmap_texel_area / screen_area))\n >= min_over_mesh (0.5*log2 (top_mipmap_texel_area / (world_area * Q))\n..= min_over_mesh (0.5*log2 ((top_mipmap_texel_area / world_area) / Q)\n..= 0.5*log2 (min_over_mesh(top_mipmap_texel_area / world_area) / Q)\n\nAnd of course the value (top_mipmap_texel_area/world_area) is a constant for each triangle the mesh (again, assuming skinning and deformation don't do extreme things), so the minimum value of that is a constant for the entire mesh that you can precalculate. The result is that we haven't done any per-triangle calculations at all at runtime. If you grind through the maths a bit more, you find that it all boils down to A+B+log2(D), where A is a per-frame constant, B is a per-mesh constant, and D is the distance of the mesh from the camera (remember that log2(x/y) = log2(x)-log2(y)). Again, quick sanity check - if you double the distance of a mesh from the viewer, log2(D) increases by 1, so you drop one mipmap level. Which is correct.\n\nThis is pretty nifty. If you're doing a streaming texture system, it means that for each mesh that uses a texture, at runtime you can do a log2 and two adds each frame and you get a result saying how many mipmap levels you didn't need. If you remember that throwing away just one mipmap level saves 75% of the texture's memory, this can be a huge benefit - who doesn't want 75% more memory!\n\nThis isn't just theory - I implemented it in the streaming system used for [[Blade II]] on the Xbox and ~PS2 engines at [[MuckyFoot]], and it saved a lot of memory per texture. This meant we could keep a lot more textures in memory. As well as allowing more textures per frame, it also meant that we could keep a lot more textures cached in memory than we actually needed for that frame. This allowed us to stream far more aggressively than we originally intended. The initial design for the ~PS2 engine was to not stream - we assumed that DVD seek times would cripple a streaming system. But because there was a lot more memory available for textures, we could prefetch further ahead and reorder the streaming requests to reduce seek times. And the system actually worked pretty well.\n\n''Flip the Question''\n\nThe next neat trick is instead of asking "how many mipmap levels do I throw away", you ask "how many do I need". This is obviously just the same number, subtracted from log2(texture_size). If we go back to the original equation of 0.5*log2 (top_mipmap_texel_area / screen_area) - look at how you'd actually calculate "top_mipmap_texel_area" for a triangle. You find the area in the idealised [0-1] UV texture coordinate space, and then you multiply by the number of texels in the texture. So the answer to the new question "how many mipmaps do I need" is:\n\n...log2(texture_size) - 0.5*log2 ((texture_size^2) * area_in_uv_space / screen_area)\n= log2(texture_size) - 0.5*log2 (area_in_uv_space / screen_area) - 0.5*log2(texture_size^2)\n= log2(texture_size) - 0.5*log2 (area_in_uv_space / screen_area) - log2(texture_size)\n= - 0.5*log2 (area_in_uv_space / screen_area)\n= 0.5*log2 (screen_area / area_in_uv_space)\n\nHey! What happened to the texture size - it all canceled out! Yep - that's right. It goes back to something I mentioned right at the top. The texture selection hardware does not care what size the top mipmap level is - unless it wants one that doesn't exist, obviously. But if you give it a large enough texture, it doesn't matter how big that texture is.\n\nAgain, every graphics coder is saying "right, I knew that, so?" But this is actually quite a non-obvious thing. It means that if you have a streaming texture system, the only thing that matters is "how many mipmaps do I need to load into memory" - the rest are left on the DVD and not loaded into memory. And the answer to that question is independent of the size of the original texture - it only depends on the mesh and the UV mapping and the distance from the camera. So the actual graphics engine - the streaming and rendering - //doesn't care how large the textures are//. And that means that the rendering speed, the space taken in memory, and the time needed to read it off disk are all identical even if the artists make every texture in the entire game 16kx16k.\n\nTo say it again: the only limit to the size of textures the artists can make is the total space on the DVD, and the time taken to make those textures.\n\nI finally grokked this when I was making the [[MuckyFoot]] engine to be used for the games after [[Blade II]] and I kicked myself. It had actually been true for the [[Blade II]] engine - I just hadn't realised it at the time. But it was a pretty cool thing to go and tell the artists that there were no practical limits to texture sizes. They didn't really believe me at first - it's an almost heretical thing to tell 15 artists who have spent their professional lives with coders yelling at them for making a texture 64x64 instead of 32x32. Nevertheless, if you have a streaming system that only loads the mipmap levels you need, it is absolutely true.\n\n''Practical Problems''\n\nOf course there's still some practical limits. One thing that breaks this is the standard artist trick of using fewer texels for less important things, such as the soles of shoes or the undersides of cars. What happened was that they would use all this new texture space for interesting things like faces and hands, so the texture size would grow. But the soles of the feet would stay the same size in texels - because who cares about them? Then the preprocessor would calculate the mipmap level it would need for the texture, and the soles of the feet would dominate the result, because it would try to still map them correctly - it doesn't know that they're less important than everything else. There's a few ways to solve this - ideally the artists would be able to tag those faces so that they are ignored in the calculation, but this can be tricky to retrofit to some asset pipelines. A way that I found worked pretty well was to take the minimum linear texel density for each triangle (this copes with stretches and skewed texture mappings), and then for the mesh take the maximum of those minimums. This will find triangles on things like the face and hands, and that will be the density you assume the artist intended. If they didn't give as many texels per meter to other triangles, you assume that was intentional - that they deliberately wanted lower resolution on those areas.\n\nOne counter-intuitive thing is that the dead-space borders between areas in a texture atlas need to grow as well. If your artists were making 256x256 textures with 4 texels of space between parts, when they up-rez to 1024x1024, the space needs to grow to 16 texels as well. The border space is there to deal with bleeding when you create mipmaps, so you need to make sure that e.g. the 64x64 mipmap is the same in both cases.
''WORK IN PROGRESS'' - which is why it's not on the main page yet.\n\nPaint mipmap levels with alpha=1 for "valid", alpha=0 for "invalid". Sample twice, once with LOD clamp, and blend based on alpha.
Codename for an Intel processor architecture that I work on. There's been some disclosure, but not everything yet. Coming soon! Top links at the moment:\n\n[["Larrabee: A Many-Core x86 Architecture for Visual Computing" - presented at Siggraph 08|http://www.siggraph.org/s2008/attendees/program/item/?type=papers&id=34]]\n[[Three short Larrabee presentations at Siggraph 08 as part of the "Beyond Programmable Shading" course|http://s08.idav.ucdavis.edu/]]\n[[A pretty comprehensive AnandTech article|http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367&p=1]]\n[[The Wikipedia page|http://en.wikipedia.org/wiki/Larrabee_(GPU)]]
Yay! Michael Abrash and I are finally going to do a talk each about the instruction set we helped develop: [[Rasterization on Larrabee: A First Look at the Larrabee New Instructions (LRBni) in Action|https://www.cmpevents.com/GD09/a.asp?option=C&V=11&SessID=9138]] and [[SIMD Programming with Larrabee: A Second Look at the Larrabee New Instructions (LRBni) in Action|https://www.cmpevents.com/GD09/a.asp?option=C&V=11&SessID=9139]]. They're pretty much a pair - going to mine without seeing Mike's first won't mean very much for example. Note that these aren't graphics talks, they're for anybody who wants to program these cores in assembly or C. We will be using parts of the graphics pipeline as examples, because that's what we've spent most of our time doing, but there's no graphics knowledge required at all. Just bring your assembly head - we're going all the way to the metal.\n\nThere's very few programmers that can say they got to invent their own [[ISA|http://en.wikipedia.org/wiki/Instruction_set]], and I'm really looking forward to finally being able to talk about it in public after about three years of secrecy. Creating an ISA is all about compromises between programmer flexibility and how difficult the hardware is to build, but I am always astonished how much we managed to pack in without too much screaming from the hardware folks. It will be interesting seeing how people react to it - it's got a lot of funky features not found in other ISA that I know of.
I've been trying to keep quiet, but I need to get one thing very clear. Larrabee is going to render ~DirectX and ~OpenGL games through rasterisation, not through raytracing.\n\nI'm not sure how the message got so muddled. I think in our quest to just keep our heads down and get on with it, we've possibly been a bit too quiet. So some comments about exciting new rendering tech got misinterpreted as our one and only plan. Larrabee's tech enables many fascinating possibilities, and we're excited by all of them. But this current confusion has got a lot of developers worried about us breaking their games and forcing them to change the way they do things. That's not the case, and I apologise for any panic.\n\nThere's only one way to render the huge range of ~DirectX and ~OpenGL games out there, and that's the way they were designed to run - the conventional rasterisation pipeline. That has been the goal for the Larrabee team from day one, and it continues to be the primary focus of the hardware and software teams. We take triangles, we rasterise them, we do Z tests, we do pixel shading, we write to a framebuffer. There's plenty of room within that pipeline for innovation to last us for many years to come. It's done very nicely for over a quarter of a century, and there's plenty of life in the old thing yet.\n\nThere's no doubt Larrabee is going to be the world's most awesome raytracer. It's going to be the world's most awesome chip at a lot of heavy computing tasks - that's the joy of total programmability combined with serious number-crunching power. But that is cool stuff for those that want to play with wacky tech. We're not assuming everybody in the world will do this, we're not forcing anyone to do so, and we certainly can't just do it behind their backs and expect things to work - that would be absurd. Raytracing on Larrabee is a fascinating research project, it's an exciting new way of thinking about rendering scenes, just like splatting or voxels or any number of neat ideas, but it is absolutely not the focus of Larrabee's primary rendering capabilities, and never has been - not even for a moment.\n\nWe are totally focussed on making the existing (and future) DX and OGL pipelines go fast using far more conventional methods. When we talk about the rendering pipeline changing beyond what people currently know, we're talking about using something a lot less radical than raytracing. Still very exciting for game developers - stuff they've been asking for for ages that other ~IHVs have completely failed to deliver on. There's some very exciting changes to the existing graphics pipelines coming up if developers choose to enable the extra quality and consistency that Larrabee can offer. But these are incremental changes, and they will remain completely under game developers' control - if they don't want to use them, we will look like any other fast video card. We would not and could not change the rendering behaviour of the existing ~APIs.\n\nI'll probably get in trouble for this post, but it's driving me nuts seeing people spin their wheels on the results of misunderstandings. There's so much cool stuff on the way that we're excited about, and we think you're going to love them too.
Well, since [[Beyond3D|http://www.beyond3d.com/content/news/557]] picked up the story, I guess I should comment. I've not technically moved to Intel just yet, but the plans are in motion. Being one of those scary foreigners, there's inevitably some paperwork to sort out first. The wheels of bureaucracy grind slowly.\n\nThe SuperSecretProject is of course [[Larrabee|http://www.google.com/search?q=Larrabee+intel]], and while it's been amusing seeing people on the intertubes discuss how sucky we'll be at conventional rendering, I'm happy to report that this is not even remotely accurate. Also inaccurate is the perception that the "big boys are finally here" - they've been here all along, just keeping quiet and taking care of business.\n\nThat said, there's been some talented people joining the Larrabee crew - both as Intel employees and as external advisers. That includes "names" that people have heard of, but also many awesomely smart people who haven't chosen to be quite as visible. It's immensely exciting just bouncing ideas off each other. Frankly, we're all so used to working within the confines of the conventional GPU pipeline that we barely know where to start with the new flexibility. It's going to be a lot of fun figuring out which areas to explore first, which work within existing game frameworks, and which things require longer-term investments in tools and infrastructure - new rendering techniques aren't that much use if artists can't make content for them.\n\nExpect to hear much much more later (you just try and stop me!), but for now we must focus on getting things done. There's plenty of time for public discussions of algorithms and suchlike in the future. Thanks for the attention, but I need to engage the Larrabee cloaking field once more. I now return you to your irregularly scheduled blog.
One of my papers turns out to be useful to someone. And they asked me what my license was - or rather, their legal department asked. This then forced me to find a suitable license, which is very annoying, because the real one is of course "don't be a dick". I don't think the lawyers liked that one. After much browsing at [[http://www.opensource.org/licenses/|http://www.opensource.org/licenses/]] and finding a lot of wordy annoying ones, I settled on the MIT license, without the middle paragraph that requires attribution, because that's just pushy. Short and sweet - and the only objectionable thing is that LAWYERS SEEM TO NEED TO YOU TALK IN CAPS A LOT. Why is that? Maybe it makes judges like you more or something? Whatever - just go out there and do good stuff with it, people.
I am TomF\nEmailMe\n\n[[Link to this blog|http://www.eelpi.gotdns.org/blog.wiki.html]]\n[[RSS feed|http://www.eelpi.gotdns.org/blog.wiki.xml]]\n[[Main website|http://www.eelpi.gotdns.org/]]\n\nThe Blog Roll:\n[[My wife|http://eelpi.livejournal.com/]]\n[[Canis|http://www.lycanthrope.org/~canis/]]\n[[Mark Baker|http://technobubble.info/]]\n[[Dave Moore|http://onepartcode.com/main]]\n[[Wolfgang Engel|http://diaryofagraphicsprogrammer.blogspot.com/]]\n[[Jake Simpson|http://blog.jakeworld.org/]]\n[[Rich Carlson|http://www.digital-eel.com/blog/]]\n[[Ignacio Castaño|http://castano.ludicon.com/blog/]]\n[[My DirectX FAQ|http://tomsdxfaq.blogspot.com/]]\n\nWhatIsThisThing\n[[TiddliWiki main|http://www.tiddlywiki.com/]]
Volker Schoenefeld took pity on my pathetic maths knowledge. I used to know all this stuff back in university, but it's atrophied. He sent me some links to some excellent tutorials and papers he's done on SH and the maths behind them. Probably the best place to start is his paper at [[http://heim.c-otto.de/~volker/prosem_paper.pdf|http://heim.c-otto.de/~volker/prosem_paper.pdf]]. He also has a Flipcode entry about it: [[http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=65274|http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=65274]] and the source to go with it [[http://heim.c-otto.de/~volker/demo.tgz|http://heim.c-otto.de/~volker/demo.tgz]]. Many thanks to you, Volker.
Well, it never rains but it pours. OK, not a true reinvention this time, but a sort of co-invention. Sony just announced their EDGE libraries recently and in them was a snazzy new vertex-cache optimiser, and some people at GDC asked me whether it was my algorithm. I had no idea - haven't written anything for the ~PS3 for ages, but I knew Dave Etherton of Rockstar ported [[my algorithm|Vertex Cache Optimisation]] to the ~PS3 with some really good results (he says 20% faster, but I think that's compared to whatever random ordering you get out of Max/Maya).\n\nAnyway, the EDGE one is actually a version of the [[K-Cache algorithm|http://www.ecse.rpi.edu/~lin/K-Cache-Reorder/]] tweaked and refined by [[Jason Hughes|http://www.dirtybitsoftware.com/]] for the particular quirky features of the post-transform cache of the RSX (the ~PS3's graphics chip). Turns out we do very similar broad-scale algorithms, but with slightly different focus - they tune for a particular cache, I try for generality across a range of hardware. It would be academically interesting to compare and contrast the two. However, they both probably achieve close enough to the ideal ACMR to ensure that vertex transformation is not going to be the bottleneck, which is all you really need.\n\nRemember - after ordering your triangles with vertex-cache optimisation, you then want to reorder your vertex buffer so that you're using vertices in a roughly linear order. This is extremely simple - it's exactly the algorithm you'd expect (scan through the tris, first time you use a vert, add it to the VB) but it's as much of a win in some cases as doing the triangle reordering.
Mucky Foot Productions Ltd - a games company in Guildford, a few miles south west of London. Guildford used to be the heart of UK indie games development, but it's all gone a bit quiet now. I worked there from late 1999 through late 2003, when it all went pear-shaped and the company folded. Worked on some fun games.\n\nI joined in the closing stages of [[Urban Chaos]] (PC, ~PS1), and then did the job of porting it to the Dreamcast (which was a fabulous machine to work on). A fun game, if a little random in places. But full of character. To my knowledge, Darci is still the gaming world's only female black cop protagonist.\n\nThen I joined the StarTopia (PC) team and took over the graphics engine, which I still play with today (http://www.eelpi.gotdns.org/startopia/startopia.html). It's a fantastic "little computer people" game - available for about $3 on Amazon these days.\n\nThen we did [[Blade II]] (Xbox1, ~PS2) - I was mainly the Xbox1 engine side of things. Not a great game, but I'm pleased with the way the graphics turned out - I managed to put in a lot of nice tech such as auto-generated normal maps, sliding-window VIPM for most meshes, and almost every asset was streamed from disk (meshes, textures, sounds, physics hulls).\n\nAfter that we started two games in parallel with a new cross-platform engine which I headed up, but we only got a year or so into development before everything collapsed. Bit of a shame really - the engine was shaping up fairly well. Over the years I've tried to document some of the techniques used - click on the Streaming and VIPM tags in the right-hand menu.
Look, please try to remember, my surname ''DOES NOT HAVE AN E IN IT''. There are approximately three times as many Forsyths in the London phone directory as Forsythes, so it's not exactly uncommon. I wouldn't mind as much if people didn't add it when I'm //spelling// my name out for them. I'm saying "eff oh are ess why tee aych........." and they write an "e" and then ask me "oh, no E?" Well - there was that big silent bit where I could have easily said "E". I didn't suddenly forget or anything. Now you're going to have to write that cheque/receipt/permit out for me again, aren't you?
[[http://notepad-plus.sourceforge.net/|http://notepad-plus.sourceforge.net/]]. Like notepad, but better. Does word-wrap right, has a bunch of neat bracket-matching options, and a whole bunch of useful stuff I haven't played with yet. Launches in an instant. It doesn't replace a good custom text editor, but it's ace for hacking text files in arbitrary languages (XML, etc) when you don't have a spare few hours for Visual Studio to start up.\n\nAnd no, I'm not going to use emacs. Perverts.
I just know I'm going to get endless emails on the pedantry of this subject, but here's my brief round up of the background. You don't need to know any of this, it's just for geeky interest.\n\nThere's the [[Nyquist rate|http://en.wikipedia.org/wiki/Nyquist_rate]]. This has two meanings, but the interesting one is that if you have a limited bandwidth medium, you absolutely can't represent a signal of a frequency more than twice that of the medium. Put into graphics terms, you only have a certain number of pixels on the screen - that's your limited bandwidth medium. So you absolutely can't represent a signal (the texture) that is more than twice that size - there's no point even trying.\n\nThere's also the [[Nyquist frequency|http://en.wikipedia.org/wiki/Nyquist_frequency]]. It says a subtly different thing, which is that if you want to represent a signal of a certain frequency, you have to sample it at twice that frequency to avoid aliasing. However, this is not actually that relevant to texturing, because generally our signal is real life, which is much higher than can be achieved on a pixel display. So we're concerned with how to give a reasonable representation of a signal that is far beyond the capabilities of our hardware, not how to choose hardware to completely represent a signal.\n\nSome people also talk about the "Nyquist limit", but it appears to be an informal term. They are usually referring to how much information you can pump through a system without it looking bad. This is a different thing to the two above, which talk about theoretical limits, not practical ones. As ever in game graphics, the academia can give you a guide and set limits, but in the real-world we tend to make large assumptions and go with what looks good. But it's nice to know in this case that there is a real basis for the approximations that seem to work fine in practice.
A handy list of rules that I use to instantly decide if someone is an idiot coder or not. The way I decide this is that I tell them the list of rules, and if they're offended, then they're an idiot. If they give me useful arguments against any of my rules then they're not idiots, they are thoughtful but misguided. If they agree with all the rules instantly, they're sheep. If they argue but then agree, they are now true disciples. It's a very handy classification system, since it completely eliminates the possibility of people who might be smarter than me.\n\nThis list will grow with time. The chance of anybody agreeing with the entire list therefore tends towards zero. Thus the number of idiots in the world grows with every update. Fear the blog.\n\n* Double precision has no place in games. If you think you need double precision, you either need 64-bit fixed point, or you don't understand the algorithm. [[A matter of precision]]\n* Euler angles are evil, unless you're actually modelling a tank turret. And even then they're evil, they just happen to be an evil hard-wired into every tank turret. Try not to spread TEH EV1L further than strictly necessary.\n* Angles in general are evil. Any time you feel the urge to use angles, a vector formulation is likely to be more robust and have fewer transcendentals or branches.\n* If you can't write the assembly for the code you're writing, you're a danger to the project and other coders. Try something a bit more low-level until you do understand what you're writing.\n* If you think you can remember precendence rules beyond +-*/, you're only 99% right. Also, nobody else can remember them, so your code is now 1% buggy, and 100% unmaintainable. Use brackets.\n* OpenGL is retarded. DirectX is flaky, transient, confusing and politically-motivated, but it's not retarded.\n* [[Scene Graphs|Scene Graphs - just say no]] are useless and complex. Occam's Razor makes mincemeat of them.
We all know and love loose octrees, right? Really useful things. Have been for used for around ten years in games - about twice that long in raytracers. Well, some doofi at Intel have patented them. [[http://www.google.com/patents?id=sH54AAAAEBAJ&dq=7002571|http://www.google.com/patents?id=sH54AAAAEBAJ&dq=7002571]] Filed in 2002. Not only is this well after they were in common use, but it's two years after the publication of Game Programming Gems 1 where [[Thatcher Ulrich|http://www.tulrich.com/geekstuff/index.html]] wrote the definitive article on them. The title of the article is the fairly obvious one - "Loose Octrees". Not exactly difficult to google for, should you be investigating a patent application on the subject, and indeed the top page found is Thatcher's. Understandably, he's not that thrilled about this either: [[9th May 2007 entry|http://www.tulrich.com/rants.html]]\n\nNormally I simply avoid reading about patents entirely, and I try not to pollute anyone else's brains with them because of the incredible abuses they are put to in this utterly broken system. But in this case it's too late - you already violated this one. You thought you were safe didn't you? Just because it was invented before most of us started programming, just because loads of people have used it in tons of shipped code, just because it's been discussed ad nauseam all over the internet, and just because it's been several years since actual publication in a popular hardback book - none of that stops some idiots from patenting the existing contents of your brain. But that's fine - the sooner someone gets sued by Intel for violation, the sooner the patent can be revoked //from orbit// for gratuitous and wanton disregard for prior art and obviousness. As Kevin Flynn put it - "the slimes didn't even change the name, man!".
Somebody asked for the slides to Michael Abrash's Pixomatic GDC 2004 talk the other day, and we at RadGameTools realised that they weren't online anywhere. So [[here they are|papers/pixomatic_gdc2004.ppt]] in a somewhat temporary place until we get them on the [[main site|http://www.radgametools.com/pixomain.htm]].\n\nSomeone also pointed out that the Dr. Dobbs Journal articles on Pixomatic are also available: [[part1|http://www.ddj.com/184405765]] [[part2|http://www.ddj.com/184405807]] [[part3|http://www.ddj.com/184405848]]. A fascinating journey into the world of the software renderer. Even in today's GPU-dominated world, it's not dead yet - and it's getting better.
A nice chap called Benedict Carter has been working on a fun zombies-vs-marines games called "Plague" - check the blog here: [[http://plague-like.blogspot.com/|http://plague-like.blogspot.com/]], and there's a download link there as well - you can get source or just a Windows EXE. Handy hint - the concussion grenades are really powerful, but have a huge range and you can easily kill yourself with them if you're not careful - stick with the incendiaries to start with (press the "9" key at the start of the game!). And before you start moaning it gets slow when you lob a bunch of molotovs around and set fire to huge forests - it's written in Python, and it's a fun home project, so shush.\n\nAnyway, why do I mention this? Because he got some inspiration for the way fire spreads from an old article on [["Cellular Automata for Physical Modelling"|http://www.eelpi.gotdns.org/papers/cellular_automata_for_physical_modelling.html]] that I wrote a while back, and was kind enough to tell me about it. I'm still not super happy with the article, and particularly the way it conflates "temperature" with "heat energy" - they're not the same thing at all of course. But it works after a fashion, and it's fairly explicit about how you need to balance having just enough realism to create the effect you want, without too much realism that it's hard to tweak or too slow to run. Game physics is not like real physics and should not be!\n\nSo, fun to see some old ideas made into an actual enjoyable game. And of course always fun to see what people get up to in their spare time. My brain has been so full of the SuperSecretProject for so long, my outside projects have basically stopped - hence no blog posts for so long. But the veil of secrecy is slowly lifting, so hopefully I'll be able to talk more about that stuff in a bit.
What we think of as conventional alpha-blending is basically wrong. The SRCALPHA:INVSRCALPHA style blending where you do FB.rgb = texel.rgb * texel.a + FB.rgb * (1-texel.a) is easy to think about, because the alpha channel blends between the texel colour (or whatever computed colour comes out the pixel shader - you know what I mean) and the current contents of the framebuffer. It's logical and intuitive. It's also rubbish.\n\nWhat you should use instead is premultiplied alpha. In DX parlance it's ONE:INVSRCALPHA, or FB.rgb = texel.rgb + FB.rgb * (1-texel.a) (and the same thing happens in the alpha channel). Notice how the texel colour is NOT multiplied by the texel alpha. It's a seemingly trivial change, but it's actually pretty fundamental. There's a bunch of reasons why, which I'll go into, but I should first mention that of course I didn't think of this and nor is it new - it's all in a 1984 paper [[T. Porter & T. Duff - Compositing Digital Images|http://keithp.com/~keithp/porterduff/]]. But twenty years later, most people are still doing it all wrong. So I thought I'd try to make the world a better place or something.\n\nAnyway, Porter & Duff talk about a lot of things in the paper so you might not want to read the whole thing right now, but the fundamentals of premultiplied alpha are easy. "Normal" alpha-blending munges together two physically separate effects - the amount of light this layer of rendering lets through from behind, and the amount of light this layer adds to the image. Instead, it keeps the two related - you can't have a surface that adds a bunch of light to a scene without it also masking off what is behind it. Physically, this makes no sense - just because you add a bunch of photons into a scene, doesn't mean all the other photons are blocked. Premultiplied alpha fixes this by keeping the two concepts separate - the blocking is done by the alpha channel, the addition of light is done by the colour channel. This is not just a neat concept, it's really useful in practice.\n\n''Better compression''\nConsider a semi-translucent texture with a variety of colours in it. With conventional blending, the colour of alpha=0 texels is irrelevant you think - because hey - they get multiplied by alpha before being rendered. But it's not true - consider when the bilinear filtering is half-way between RGBA texels (1,0,0,1 = solid red) and (0,0,1,0 = transparent blue). The alpha value is obviously interpolated to 0.5, but the colour is also interpolated to (0.5, 0, 0.5), which is a rather ugly shade of purple. And this will be rendered with 50% translucency. So the colours of texels with alpha=0 are important, because they can be "pulled in" by the filtering.\n\nThe answer with normal alpha-blending is to do a flood-fill outwards, where texels with non-zero alpha copy their colours to texels with zero alpha. This is a pain to write though, and of course some alpha=0 texels have multiple neighbouring texels, and they have to do an average of them or something heuristic like that.\n\nThen you go to encode it as a DXTn texture, and you hit more problems. First of all, DXT1 can encode alpha=0 texels, but it can't encode them with any colour other than black. So all your flood-filling was pointless, and black is going to bleed in. I've actually seen this in shipping games - the edges of translucent stuff is bizarrely dark. For example, you get a brightly-lit green leaf rendered against a bright blue sky, so the result of any blending between them should be bright, but there's a dim "halo" around the leaves. Silly developer!\n\nNow OK, DXT1 isn't all that useful for alpha textures because although the top-level mipmap might only have alpha=0 or alpha=1, as soon as you start to make mipmap levels you need some intermediate levels. So let's try DXT3 or DXT5. Same problems with both of those. First of all, the bleeding of course, so fine, you write a flood-filler, whatever. The next problem is when you come to compress the texture. Your flood filler has made sure neighbouring texels are something like the solid ones, but they will be different, especially if the flood-filler has had to blend them. But DXTn compression does badly as the number of colours increases, and the flood-filler just added texel colours. What's worse, they're not particular significant colours (as mostly they're invisible, it's just the "halo" effect you're trying to prevent). But it's hard to tell most DXTn compressors about this, so they see a bunch of texels that have very different colours, and try to satisfy all of them. What happens visually is that you get significantly worse compression around the translucent edges of textures than in the opaque middle, which is very counter-intuitive.\n\nOK Mr. Smarty-pants, what's the answer? Well, premultiplied alpha. All you do is before you compress, you multiply the colour channel by the alpha channel, and during rendering change blend mode from SRCALPHA:INVSRCALPHA to ONE:INVSRCALPHA. That's it - it's not rocket-science.\n\nSo what now happens is all the alpha=0 texels are now black. Wait - but you'll get bleeding and halos! No, you don't. Let's do the half-texel example again. Let's say we have an entirely red texture, but with some bits alpha'd, and we render onto a green background. You'd expect to get shades of red, green and yellow - but the darkest yellow should be around (0.5,0.5,0) - no dark halos of something like (0.25,0.25,0), right?. So (1,0,0,1 = solid red) and (1,0,0,0 = transparent red). The second one gets premultiplied before compression to (0,0,0,0). Now we bilinear filter between them and get (0.5, 0, 0, 0.5). And then we render onto bright green (0,1,0).\n\nFB.rgb = texel.rgb + (1-texel.a) * FB.rgb\n= (0.5, 0, 0) + (1-0.5) * (0,1,0)\n= (0.5, 0, 0) + (0, 0.5, 0)\n= (0.5, 0.5, 0)\nwhich is exactly what we were expecting. No dark halos. And since when alpha=0, the colour we WANT to encode is black, DXT1 does exactly the right thing. Lucky that, eh? Well no, not really - DXT1 is ''meant'' to be used with premultiplied alpha (even though the docs don't say this).\n\nBetter still, when you encode the texture into DXT3, all the edge texels are black, whatever their neighbours are. So that's just a single colour for the encoder to fit, rather than a bunch of them. In general, this leads to more consistent compression quality for translucent textures.\n\nBy the way - DirectX has the two "premultiplied alpha" compressed texture formats DXT2 and DXT4. I'm not sure why they added them - they are rendered identically to DXT3 and DXT5. The fact that they're different formats is meant to be used as meta-data by the application using them. Thing is - you usually need far far more meta-data than that - is the texture a normal map, does the alpha channel hold translucency or is it a specular map instead, etc. Seems pointless to me. It looks like they're going away in DX10 which is good (DXT2/3 become BC2, DXT4/5 become BC3). In general I'd avoid using DXT2 and DXT4 - I have seen (early) drivers that did something strange when you used them - I think it was actually trying to do the multiply for me, which is not what I wanted. Stick to DXT3 and DXT5 for sanity.\n\n''Compositing translucent layers''\nSome rendering techniques want to composite two translucent layers together, and then render the resulting image onto a third. The common one I've seen (and used) is for impostors. This is where you take a high-polygon object in the distance and render it to a texture once. Then, as long as the camera doesn't move too much, you can just keep drawing the texture as a billboard, without re-rendering the object itself and burning all that vertex & pixel shader power on a visually small object.\n\nIt's rather more complex than that, but the fundamental point is that you're doing a render to a texture of an image, then rendering that to the screen. The problem comes when the object itself has multiple translucent textures that might overlap. So the effect you're looking for is as if you did a standard render - background, then translucent tex1, then translucent tex2. The problem is, what you're actually doing is rendering tex1 then tex2 to an intermediate, then rendering that onto the background. Let's do some maths. Assuming normal alpha-blending, what we want is:\n\nRender tex1:\nfb2.rgb = (tex1.rgb * tex1.a) + ((1-tex1.a) * fb1.rgb)\nRender tex2:\nfb3.rgb = (tex2.rgb * tex2.a) + ((1-tex2.a) * fb2.rgb)\n= (tex2.rgb * tex2.a) + ((1-tex2.a) * ((tex1.rgb * tex1.a) + ((1-tex1.a) * fb1.rgb)))\n\nSo now we try to emulate this with our impostoring:\n\nRender tex1:\nimpostor1.rgb = tex1.rgb\nimpostor1.a = (1-tex1.a)\nRender tex2:\nimpostor2.rgb = (tex2.rgb * tex2.a) + ((1-tex2.a) * impostor1.rgb)\nimpostor2.a = well ... er ... that's the question - what do I do with the alpha channel? Add? Mutiply?\nRender impostor to FB:\nfb2.rgb = (imposter2.rgb * imposter2.a) + ((1-imposter2.a) * fb1.rgb)\n\n...and you'll find that whatever blend modes you choose, you basically get a mess.\n\nBut if we use premultiplied alpha everywhere, it's a doddle. What we want is:\n\nRender tex1:\nfb2.rgb = (tex1.rgb) + ((1-tex1.a) * fb1.rgb)\nRender tex2:\nfb3.rgb = (tex2.rgb) + ((1-tex2.a) * fb2.rgb)\n= (tex2.rgb) + ((1-tex2.a) * ((tex1.rgb) + ((1-tex1.a) * fb1.rgb)))\n= (tex2.rgb) + ((1-tex2.a) * (tex1.rgb)) + ((1-tex2.a) * (1-tex1.a) * fb1.rgb))\n\nAnd we're actually doing:\n\nRender tex1 (impostor starts at (0,0,0,0):\nimpostor1.rgb = (tex1.rgb) + ((1-tex1.a) * (0,0,0))\nimpostor1.a = (tex1.a) + ((1-tex1.a) * 0)\nRender tex2:\nimpostor2.rgb = (tex2.rgb) + ((1-tex2.a) * impostor1.rgb)\n= (tex2.rgb) + ((1-tex2.a) * tex1.rgb)\nimpostor2.a = (tex2.a) + ((1-tex2.a) * (impostor1.a))\n= (tex2.a) + ((1-tex2.a) * (tex1.a))\nRender impostor to FB:\nfb2.rgb = (imposter2.rgb) + ((1-imposter2.a) * fb1.rgb)\n= (tex2.rgb) + ((1-tex2.a) * tex1.rgb) + ((1-((tex2.a) + ((1-tex2.a) * (tex1.a)))) * fb1.rgb)\n\nThat scary-looking (1-((tex2.a) + ((1-tex2.a) * (tex1.a)))) bit looks horrendous but actually reduces to a nice simple ((1-tex1.a) * (1-tex2.a)), so the final result is:\nfb2.rgb = (tex2.rgb) + ((1-tex2.a) * tex1.rgb) + ((1-tex1.a) * (1-tex2.a) * fb1.rgb)\n...which is exactly what we want - yay! So you can see that premultiplied-alpha-blending is associative, i.e. ((a @ b) @ c) is the same as (a @ (b @ c)) (where @=the alpha-blend operation), which is incredibly useful for all sorts of image compositing operations.\n\n''Multipass lighting techniques''\nA common thing in rendering is to render an object multiple times, once per light. This means you can do shadowing with things like stencil volume shadows that can only deal with a single light per pass. The problem comes with translucent textures. You still want them to be shadowed, but the math doesn't work:\n\nWhat we want is:\nfb2.rgb = ((tex.rgb * (light1.rgb + light2.rgb))*tex.a) + ((1-tex.a) * fb1.rgb)\n\nBut pass 1:\nfb2.rgb = (tex.rgb * light1.rgb * tex.a) + ((1-tex.a) * fb1.rgb)\nPass 2:\nfb3.rgb = (tex.rgb * light2.rgb * tex.a) + ((1-tex.a) * fb2.rgb)\n= (tex.rgb * light2.rgb * tex.a) + ((1-tex.a) * ((tex.rgb * light1.rgb * tex.a) + ((1-tex.a) * fb1.rgb)))\nOh dear, that's a mess. Let's try using additive blending for the second pass instead. After all, that's what we'd use for opaque objects.\nfb3.rgb = (tex.rgb * light2.rgb) + (fb2.rgb)\n= (tex.rgb * light2.rgb) + ((tex.rgb * light1.rgb * tex.a) + ((1-tex.a) * fb1.rgb))\n= (tex.rgb * (light2.rgb + (light1.rgb * tex.a))) + ((1-tex.a) * fb1.rgb))\n\n...which is closer, but still not right. OK, let's try with premultiplied alpha. What we want is:\nfb2.rgb = (tex.rgb * (light1.rgb + light2.rgb)) + ((1-tex.a) * fb1.rgb)\n\nPass 1:\nfb2.rgb = (tex.rgb * light1.rgb) + ((1-tex.a) * fb1.rgb)\nPass 2 uses additive blending, just like opaque objects:\nfb3.rgb = (tex.rgb * light2.rgb) + (fb2.rgb)\n= (tex.rgb * light2.rgb) + ((tex.rgb * light1.rgb) + ((1-tex.a) * fb1.rgb))\n= (tex.rgb * (light1.rgb + light2.rgb)) + ((1-tex.a) * fb1.rgb))\n\nMagic! Again, it actually makes perfect sense if you think about each pass and ask how much light each one lets through, and how much light each adds, and then realise that actually there's three passes - the first one occludes the background by some amount, the second adds light1, the third adds light2. And all we do is combine the first two by using the ONE:INVSRCALPHA blending op.\n\n''Additive and blended alpha in a single operation''\nIn some places in your game, you want "lerp" materials - standard alpha-blending ones that occlude the background to some extent and add their own colour in to an extent - for example, smoke particle systems are like this. In other places, you just want to do an additive blend without occluding the background - for example a fire particle system. So we have two very common blending modes usually called "add" and "lerp" or somesuch. But what if you want a material with bits of both? What if you want a single particle system that has additive flame particles turning into sooty lerping particles as they age? You can't change renderstate in the middle of a particle system, that's silly. Who can help us now? Why - it's Premultiplied Alpha Man - thank god you're here!\n\nBecause of course if you're using premultiplied alpha, what happens if you just set your alpha value to 0 without changing your RGB value? In normal lerp blending, what you get is a fully translucent surface that has no effect on the FB at all. In premultiplied alpha, you... er ... oh, it's additive blending! Blimey, that's a neat trick. So you can have particles change from additive to lerp as they get older - all you do is change the alpha value from 0 and the texture colour from a firey red/yellow colour towards an alpha of 1 and a dark sooty colour. No renderstate changes needed, and it's a nice smooth transition. The same trick works for mesh textures where you want both glows and opaque stuff - one example is a neon sign where the actual tubes themselves and things like the support brackets or whatever are opaque, but the (faked) bloom effect from the tubes is additive - you can do both in a single pass.\n\n\nSo anyway, yeah, premultiplied alpha. Use it, love it, pass it on. Then maybe we'll only take another 20 years til everyone's doing this stuff correctly.
It's the bit of circuitry on a video card that reads the framebuffer out of the RAM, puts it through a Digital to Analogue Convertor, and feeds it to your monitor. RAM+DAC you see. In the old days when VGA was a card, not a cable standard, that's pretty much all a video card was - some memory and a thing to output it to the monitor. People still use the term RAMDAC to refer to the chip that outputs DVI, even though it doesn't actually have a DAC (Digital to Analogue Convertor) in it.
Now with RSS feeds - which is handy in case I accidentally post something useful.\n\nI'm a total newb on interweb tech, but RSS is fascinating - there appears to be at least five different versions, apart from the one called ATOM which isn't a different version, it's a completely different thing altogether (...and cue Airplane gag). "Open Standards" are such an oxymoron. Precisely because they're open, they're not standard. I mean, if they were standard, you couldn't change them at will, so they wouldn't be open. So the only truly Open Standard is one you just invented that nobody knows about yet.\n\nIf an Open Standard is RFC'd in a forest, but there's no slashdot post, does anyone file a submarine patent on it?
Hmmm... not that impressed by the automagic RSS feed generator on these TiddlyWikis. There's probably lots of cunning ways to tweak it with embedded commands, but I'd have to RTFM. The problem is that every time I touch a page it gets put to the top, even if it's just minor tweaks (e.g. adding tags or corrcting typos), or adding tiddlers that are not directly part of the blog. Also, the "description" field isn't much of a description - it's the entire text of the entry. I'm sure that's not the intent of that field. So I might try doing it by hand for a bit - just add the blog bits myself. The format isn't exactly rocket-science now I have a template.\n\nTell you what, I'll set up both - the really verbose automatic one and a hand-done one that will just have the blog headlines. Then you can choose which you like best. By the way, the RSS feed for this page is the same as the web address but with .xml instead of .html. Firefox does that automagically and gives you a little orange icon to click, but it looks like a lot of other things don't, so I've added a manual link at the left hand side.\n\nIf you're still having problems with the RSS feed, yell. I'm not a big RSSer myself, so I don't really know what standard practice is for that stuff, and I'd like to tweak it now, get it right and then forget about it :-)
In other news - I think RSS is a good idea waiting for a decent implementation. There's two ways to do this - I can make the manually-edited feed have the complete text with the Wiki stuff edited out, but then you can't use the Wiki links, which is kinda the whole point of using a Wiki. Or I can continue to make it just have like a single-line description of the post, and any decent RSS reader should include the link to the full post you can click on if the one-line teaser intrigues you.\n\nSo, both of you out there using RSS, bet now! Bet bet bet bet bet! It's even easier now I added an email address to the menu on the left :-)
http://www.radgametools.com/ Where I used to work before joining Intel. Rad products include [[Bink|http://en.wikipedia.org/wiki/Bink_video]], Miles, Pixomatic and [[Granny3D]].\n\nRad includes luminaries like [[Michael Abrash|http://en.wikipedia.org/wiki/Michael_Abrash]], Mike Sartain, Jeff Roberts, Dave Moore and John Miles (yes, that's why it's called the "Miles Sound System").\n\nTechnically I still work //at// Rad, I just don't work //for// Rad. It's complex.
Deano Calver has a neat article comparing ray-tracing and rasterisation. http://www.beyond3d.com/content/articles/94\n\nI basically agree with it - ray-tracing has a limited number of things it's good at, and a large number of things it's very bad at, and a bunch of things people think it's good at that are not raytracing at all and can be applied to either system. The arguments I always have with people are about scalability - how well do the two methods scale as your amount of content rises?\n\nTo put it in a very simplistic way, raytracing takes each pixel and then traverses the database to see what object hit that pixel. Whereas rasterisation takes the database and traverses the screen to see what pixels each object hit. In theory they sound the same, but with the inner and outer loops swapped (pixels vs objects) - should be the same, right?\n\nThe problem is that rasterisation has some very simple ways of not checking every single pixel for each object, e.g. bounding boxes of objects and triangles, and rasterisation algorithms that know what shape triangles are. Raytracing also has a bunch of ways of not checking every object for each pixel, but those methods are not as simple for the same win, and can take time to reconstruct when the scene changes.\n\nThe other fundamental problem is the thing in the inner loop - changing from one pixel to the next is pretty easy for a rasteriser - pixels are pretty simple things, laid out in a nice regular grid. The rasteriser can keep around a lot of information from pixel to pixel. The raytracer on the other hand is stepping through objects in the scene - they're not inherently nicely ordered, and the state of object #1 says nothing about the state of object #2. So logically you'd want objects in your outer loop and pixels in your inner loop.\n\nThat's an absurdly simplistic way of looking at things of course, but it's a handy O(n)-notation way of thinking about it. You can then start digging and find lots of other reasons why one is slightly more efficient than the rest (after all, bubblesorts are quicker than mergesorts in small lists in architectures with high branch penalties), but you still have to work very hard to overcome the ~BigO problem. And of course the real world shows a lot of harsh realities.\n\nWe'll put aside the massive dominance of hardware rasterisers, because it is somewhat of a local-minimisation problem, in that what the hardware companies know how to build is rasterisers, and they don't know how to build raytracers, so that's what they build, and therefore that's what people program for. But in the offline rendering world, where the hardware is the same - general-purpose ~CPUs - the dominant mechanism is still the rasteriser, in the form of the REYES system. It's not your standard rasteriser of course - there's a long way between what it does and what graphics cards currently do - but nevertheless the fundamental principle holds - for each object, they see what pixels it hits. Not the other way round.\n\nRaytracing has its places - it's just not The Future. But then it's possible neither is rasterising - currently neither has a great solution to the lighting problem.
Ignacio Castaño recently looked into some explicit ordering methods for regular grids. [[http://castano.ludicon.com/blog/2009/02/02/optimal-grid-rendering/|http://castano.ludicon.com/blog/2009/02/02/optimal-grid-rendering/]]. It was kinda neat to see that [[my algorithm|http://www.eelpi.gotdns.org/papers/fast_vert_cache_opt.html]] does really well considering it's cache-size-blind. I'm not quite sure why the weakness at size=12, but when writing it I did observe that on large regular grids it would get a bit degenerate and start to do long double-wide strips. Real game meshes have sections of regular grid (e.g. 10x10 chunks), but those are small enough that the algorithm quickly hits discontinuities that kick it out of the double-wide mode and into the more Hilbert-like mode.\n\nI have thought about removing what I think is the algorithm's biggest weakness - the first few triangles, when it doesn't have any data in the cache and it's basically making random choices. So my theory was - run the algorithm until half the mesh is done. Then throw the first 100 tris back in the "available" pool and run until the end.\n\nThe fun thing is that when I originally wrote the algorithm, it just needed to be ~OKish, it didn't need to be good. The real priority was it had to be fast and space-efficient, because it was running as part of the Granny3D export pipeline from Max/Maya, and any sort of half-decent ordering was better than the essentially random ordering we were getting. It was a nice accident that it turned out to be really good with only an extra day's work.
Someone just pointed out Kurt Pelzer's article in Game Programming Gems 4 called "Combined Depth and ID-Based Shadow Buffers". I don't have the book (which is stupid of me), so I can't check, but there's really no magic to the algorithm except the realisation that you //can// combine the two. After that it's all fairly obvious, so we probably have much the same technique. So I just reinvented somebody else's wheel, which is stupid of me - sorry about that Kurt.
Renderstate changes cost valuable clock cycles on both the CPU and GPU. So it's a good idea to sort your rendering order by least-cost. But you need to be careful you know what "least-cost" means - it's not just the number of ~SetRenderState calls you send down!\n\n(note - I'm going to suggest some numbers in this section - all numbers are purely hypothetical - they are realistic, but I have deliberately avoided making them the same as any hardware I know about - they're just for illustration)\n\nTypically, graphics hardware is driven by a bunch of internal bits that control the fixed-function units and the flow of the overall pipeline. The mapping between those bits and the renderstates exposed by the graphics API is far from obvious. For example, the various alpha-blend states look simple in the API, but they have large and wide-ranging implications for the hardware pipeline - depending on what you set, various types of early, hierarchical and late Z will be enabled or disabled. In general, it is difficult to predict how some render states will affect the pipeline - I have written graphics drivers, I have an excellent knowledge of graphics hardware, and I'm responsible for some right now, and I //still// have a hard time predicting what a change in a single renderstate will do to the underlying hardware.\n\nTo add to the complexity, a lot of modern hardware can have a certain limited number of rendering pipeline states (sometimes called "contexts", in a slightly confusing manner) in flight. For further complexity, this number changes at different points in the pipeline. For example, some hardware may be able to have 2 pixel shaders, 2 vertex shaders, 4 sets of vertex shader constants in flight at once, and 8 different sets of textures (again, numbers purely for illustration). If the driver submits one more than this number at the same time, the pipeline will stall. Note the differing numbers for each category!\n\nIn general, there are some pipeline changes which are cheaper than others. These are "value" changes rather than "functional" changes. For example, changing the alpha-blend state is, as mentioned, a functional change. It enables or disables different things - the ordering of operations in the pipeline changes. This can be expensive. But changing e.g. the fog colour is not a functional change, it's a value change. Fog is still on (or off), but there's a register that changes its value from red to blue - that's not a functional change, so it is usually fairly cheap. Be aware that there are some that look like "value" changes that can be both. For example, Z-bias looks like just a single value that changes from - should be cheap, right? But if you change that value to 0, then the Z-bias circuitry can be switched off entirely, and maybe the pipeline can be reconfigured to go faster. So although this looks like a value change, it can be a functional change as well.\n\nWith that in mind, here's a very general guide to state change costs for the hardware, ordered least to most:\n\n* Changing vertex, index or constant buffers (~DX10) for another one ''of the same format''. Here, you are simply changing the pointer to where the data starts. Because the description of the contents hasn't changed, it's usually not a large pipeline change.\n\n* Changing a texture for another texture identical in every way (format, size, number of mipmaps, etc) except the data held. Again, you're just changing the pointer to the start of the data, not changing anything in the pipeline. Note that the "same format" requirement is important - in some hardware, the shader hardware has to do float32 filtering, whereas fixed-function (i.e. fast) hardware can do float16 and smaller. Obviously changing from one to the other requires new shaders to be uploaded!\n\n* Changing constants, e.g. shader constants, fog color, transform matrices, etc. Everything that is a *value* rather than an enum or bool that actually changes the pipeline. Note that in ~DX10 hardware, the shader constants are cheap to change. But in ~DX9 hardware, they can be quite expensive. A "phase change" happened there, with shader constants being moved out of the core and into general memory.\n\n* Everything else - shaders, Z states, blend states, etc. These tend to cause widescale disruption to the rendering pipeline - units get enabled, disabled, fast paths turned on and off. Big changes. I'm not aware of that much cost difference between all these changes.\n\n(note that changing render target is even more expensive than all of the above - it's not really a "state change" - it's a major disruption to the pipeline - first and foremost, order by render target)\n\nOK, so that's the ''hardware'' costs. what about the ''software'' costs?\n\nWell, there's certainly the cost of actually making the renderstate change call. But that's usually pretty small - it's a function call and storing a DWORD in an array. Do not try to optimise for the least number of ~SetRenderState calls - you'll spend more CPU power doing the sorting than save by minimising the number of calls. Total false economy.\n\nAlso note that the actual work of state changes doesn't happen when you do the ~SetRenderState. It happens the next time you do a draw call, when the driver looks at the whole state vector and decides how the hardware has to change accordingly - this is caled "lazy state changes", and every driver does it these days. Think of it as a mini-compile stage - the driver looks at the API specification of the pipeline and "compiles" it to the hardware description. In some cases this is literally a compile - the shaders have to be changed in some way to accommodate the state changes. This mini-compile is obviously expensive, but the best drivers cache states from previous frames so that it doesn't do the full thinking every time. Of course, it still has to actually send the new pipeline to the card, and stall when the card has too many contexts in flight (see above). So what tends to happen is that on the CPU side, there's states that cause a new pipeline (tend to be higher on the list above), and states that don't (tend to be lower). But there's a much lower difference in cost between top and bottom.\n\n(note that although the SuperSecretProject is a software renderer, you still have a certain cost to state changes, and those costs are reflected in the above data. It's not exactly a coincidence that a software renderer and hardware renderers have similar profiles - they have to do the same sort of things. Our consolation is that we don't have any hard limits, so in general the cost of changing renderstates is much lower than hardware).\n\nOf course it's also a good idea to sort front-to-back (for opaque objects), so that has to be reconciled with sorting by state. The way I do it is to sort renderstates hierarchically - so there's four levels according to the above four items. Generally I assign a byte of hash per list item, and I concatenate the bytes to form a uint32 (it's a lucky accident there's four items - I didn't plan it that way). Also note this hash is computed at mesh creation time - I don't go around making hashes at runtime.\n\nSo now I have a bunch of hashed renderstates, what order do I put them in? What I generally do is assume the first two types of state change - changing constant buffers and texture pointers - are much cheaper than the others (also, you don't tend to change shader without changing shader constants and textures anyway, so it's a reasonable approximation). So I make some "sub-buckets" of the objects that share the same state in the last two items. In each sub-bucket I pick the closest object to the camera. Then I sort the sub-buckets according to that distance (closest first), and draw them in that order. Within each bucket, I draw all the objects that have each renderstate in whatever order they happen to be - all I care about is getting all the objects for one state together. This gets me a fair bit of early-Z rejection without spamming the hardware with too many expensive changes.\n\nIn theory by sorting within each sub-bucket you could get slightly better driver/hardware efficiency, but I suspect the effort of sorting will be more expensive than the savings you get. I've not measured this, so this is just my experience talking. By all means if you have the time, test these hunches and let me know if my intuition is wide of the mark. It wouldn't be the first time.\n\nNote that obviously I do the "first nearest object" as I'm putting the objects into the sub-bins, not as a later sorting stage, and there's a bunch of other obvious implementation efficiencies. And I use a bucket-sort for Z rather than trying for high precision. So although it sounds complex (it's surprisingly difficult to explain in text!), the implementation is actually fairly simple and the runtime cost is very low.
The range of names is mind-boggling.\nSSE\n~SSE2\n~SSE3\n~SSSE3\n~SSE4\n~SSE4.1 (maybe it's like ~SSE4 with an extra low-frequency channel?)\n~SSE4.2\n...and now ~SSE5\n\nI'll be so glad when this instruction set is replaced by something more sensible. Although ~SSE5 does finally add ternary instructions and multiply-add.
A [[Scene Graph|http://en.wikipedia.org/wiki/Scene_graph]] is essentially a method of rendering where you place your entire world into this big graph or tree with all the information needed to render it, and then point a renderer at it. The renderer traverses the graph and draws the items. Often, the edges between the nodes attempt to describe some sort of minimal state changes. The idea is fairly simple - we're computer scientists, so we should use fancy data structures. A tree or graph is a really cool structure, and it's great for all sorts of things, so surely it's going to be good for rendering, right?\n\nIt's a great theory, but sadly it's completely wrong. The world is not a big tree - a coffee mug on a desk is not a child of the desk, or a sibling of any other mug, or a child of the house it's in or the parent of the coffee it contains or anything - it's just a mug. It sits there, bolted to nothing. Putting the entire world into a big tree is not an obvious thing to do as far as the real world is concerned.\n\nNow, I'm not going to be a total luddite and claim that trees are not useful in graphics - of course they are. There's loads of tree and graph structures all over the place - BSP trees, portal graphs, skeletal animation trees, spatial octrees, partitioning of meshes into groups that will fit in memory (bone-related or otherwise), groups of meshes that are lit by particular lights, lists of objects that use the same shader/texture/post-processing effect/rendertarget/shadowbuffer/etc. Tons of hierarchical and node/edge structures all over the place.\n\nAnd that's the point - there's tons of them, they all mean different things, and (and this is the important bit) they don't match each other. If you try to put them all into one big graph, you will get a complete bloody mess. The list of meshes that use shadowbuffer X are completely unrelated to the list of meshes that use skeleton Y. I've seen many Scene Graphs over the years, and typically they have a bunch of elegant accessors and traversers, and then they have a gigantic list of unholy hacks and loopholes and workarounds and suchlike that people have had to add to get them to do even the simplest demos. Try and put them in a big, real, complex game and you have a graph with everything linked to everything. Either that, or you have a root node with fifty bazillion direct children in a big flat array and a bunch of highly specialised links between those children. Those links are the other structures that I mentioned above - but now they're hard to access and traverse and they clutter up everything.\n\nFundamentally of course you do have to resolve a traversal order somehow - the objects need rendering, and there's some sort of mostly-optimal way to do it. But you're just never going to get that ordering by just traversing a single tree/graph according to some rules. It's fundamentally more complex than that and involves far more tradeoffs. Do you traverse front to back to get early-Z rejection? Do you traverse by shadowbuffer to save switching rendertargets? Do you traverse by surface type to save switching shaders? Do you traverse by skeleton to keep the animation caches nice and warm? All of this changes according to situation and game type, and that's what they pay us graphics coders the big bucks for - to make those judgement calls. You can't escape these problems - they're fundamental.\n\nBut trying to fit them all into one uber-graph seems like madness. This is compounded by the fact that most Scene Graphs are implicitly a "retained mode" paradigm, and not an "immediate mode" paradigm. My esteemed friend [[Casey Muratori|https://mollyrocket.com/forums/]] has some uncomplimentary comments on doing things with retained mode concepts, and I'm very inclined to agree. Yes, they can work, and sometimes they're necessary (e.g. physics engines), but it's not what I'd choose given an alternative.\n\nThe one major argument I hear presented for Scene Graphs is the "minimal state change" concept. The idea is that the edges between your nodes (meshes to be rendered) hold some sort of indication of the number or cost of state changes going from one node to the other, and by organising your traversal well, you achieve some sort of near-minimal number of state changes. The problem with this is it is almost completely bogus reasoning. There's three reasons for this:\n\n1. Typically, a graphics-card driver will try to take the entire state of the rendering pipeline and optimise it like crazy in a sort of "compilation" step. In the same way that changing a single line of C can produce radically different code, you might think you're "just" changing the AlphaTestEnable flag, but actually that changes a huge chunk of the pipeline. [[Oh but sir, it is only a wafer-thin renderstate...|http://en.wikipedia.org/wiki/Mr._Creosote]] In practice, it's extremely hard to predict anything about the relative costs of various changes beyond extremely broad generalities - and even those change fairly substantially from generation to generation.\n\n2. Because of this, the number of state changes you make between rendering calls is not all that relevant any more. This used to be true in the DX7 and DX8 eras, but it's far less so in these days of DX9, and it will be basically irrelevant on DX10. The card treats each unique set of states as an indivisible unit, and will often upload the entire pipeline state. There are very few //incremental// state changes any more - the main exceptions are rendertarget and some odd non-obvious ones like Z-compare modes.\n\n3. On a platform like the PC, you often have no idea what sort of card the user is running on. Even if you ID'd the card, there's ten or twenty possible graphics card architectures, and each has a sucession of different drivers. Which one do you optimise for? Do you try to make the edge-traversal function change according to the card installed? That sounds expensive. Remembering that most games are limited by the CPU, not the GPU, and you've just added to the asymmetry of that load.\n\nAnyway, this is something we can argue til the cows come home. I just wanted to give my tuppen'orth against the "prevailing wisdom" of using a Scene Graph. I'm not saying there aren't some good reasons for using one, but after writing a fair number of engines for a fair number of games, I have yet to find any of those reasons particularly compelling. They invented a lot of cool stuff in the 70s and 80s, and there's plenty of it that we continue to [[ignore at our peril|Premultiplied alpha]]. But some of it was purely an artifact of its time, and should be thwacked over the head and dumped in a shallow grave with a stake through its heart.
Wolfgang Engel and Eric Haines have managed to get the ~ShaderX2 books made [[free and downloadable as PDFs|http://www.realtimerendering.com/blog/shaderx2-books-available-for-free-download/]]. I wrote three articles that were included in ~ShaderX2 Tips & Tricks, and it's nice to have them "live" and on the net. I'd also like to convert them to HTML (as I have done some previous articles) and put them on my website - much easier to search than ~PDFs - but honestly I'm so busy with all things [[Larrabee]] that it may be some time before I work up the energy. But a quick self-review is possibly in order:\n\n''Displacement mapping''\n\nOh dear - a lot of talking, and almost no content! It's a "Tom talks for a while about the state of the industry" article. Unfortunately, the industry still hasn't figured out how to do displacement mapping, which is immensely frustrating. We're still stuck in the chicken-and-egg of having to solve three difficult problems before we get any progress.\n\nIn my partial defence, all the content is in the very first article in ~ShaderX2 - "Using Vertex Shaders for Geometry Compression" by Dean Calver. We realised early on that our articles would overlap, and since Dean had shader code ready to go, he talked about that while I talked about tools. You can think of them as two parts of a larger article.\n\nAside from moaning about why hardware doesn't do displacement mapping (and it still doesn't!), I think the main goodness from the articles is [[Displacement Compression]], which I think is still an interesting technique. There's some more details on the algorithm in my [[GDC2003 paper|http://www.eelpi.gotdns.org/papers/papers.html]], and Dean shows some good shader code. Sadly, [[MuckyFoot]] dissolved before we were able to battle-test the pipeline described in the article, so we'll never know how elegantly it would have worked in practice. I always wanted to bolt the system into [[Granny3D]]'s export pipeline, but it just wasn't a natural fit. I still think it's an excellent way to render high-polygon images without waiting for [[Larrabee]] to change the world.\n\n''Shader Abstraction''\n\nIt's always difficult to talk about engine design in articles without gigantic reams of code. I think this is one of my more successful efforts, but it's always hard to be sure. A lot of these concepts were expanded upon by others in the later ~ShaderX books (where I was sub-editor of the "3D Engine Design" section).\n\nProbably the one part of this article that I haven't seen discussed elsewhere is the texture compositing system. That was used in a production pipeline (a prototype in [[Blade II]], and more extensively in the following project), and it worked well. The biggest benefit was that the artists could use any file and directory structure they liked, and they could copy and rename textures at will, store them in any format they liked, and so on. This made life much simpler for them, as they didn't need to constantly worry about bloating disk or memory space. When it was time to bake the assets together on the disk, all that duplication was removed by hashing, and the fiddly stuff like format conversion and packing specular maps into alpha channels all happened automatically. Discussing this with other game devs I know others have invented similar systems, I've just never seen anybody write about them.\n\n''Post Process Fun with Effects Buffers''\n\nThis was the result of me trying to figure out how to structure an engine that did more than just the usual opaque and alpha-blended stuff. It adds a lot of structure on top of a standard pipeline. In one sense I'm not that happy with it - it turned out far more complex than I originally hoped, with lots of structures, dynamic allocation of rendertargets, and object callbacks. It feels like overkill for a couple of cheesy post-processing effects. On the other hand, once you have that complexity, it does make some things pleasingly elegant. What is still unproven is whether that extra complexity plays well with a full-blown game rendering engine rather than a simple demo. I still think there's some interesting ideas here.
GDC2007 was a lot of fun, though tiring. On the first day, I did two talks.\n\nThe first was about shadowbuffers. Yes, I know, again. What can I say - a GDC wouldn't be complete without me giving yet another talk about shadowbuffers - can't fight tradition. Anyway, for this one I figured out how to combine ID and depth shadowbuffers to get the best of both worlds - no surface acne, and no Peter Pan problems. Additionally, this is robust - it doesn't fall over according to the size of your world or the distance to the light or what objects are in the scene, which is a constant problem with depth-only shadowbuffers. As a nice bonus, it only needs 8 bits of ID and 8 bits of depth, so not only do you only need a 16-bit-per-texel shadowbuffer, but it works on pretty much every hardware out there. Certainly everything with shaders or register combiners (including Xbox1 and GameCube), and you could probably even get it to work on a PS2 if you tried hard enough. I think this problem can now be marked SOLVED, and combined with Multi Frustum Partitioning (which is still more expensive than I'd like, but does work), I think hard-edged shadows are also a SOLVED problem. Now we just have make the edges nice and soft - plenty of methods in the running, but still no clear winner. Lots of people researching it though, so I think it shouldn't be too long.\n\nThe other talk was only half an hour, and it was a brief look at ways to integrate lots of complex shaders into our pipelines. Most people are moving from just a few simple shaders in previous gen hardware up to the full combinatorial craziness of today's hardware. There's a lot of methods that work for 10s of shaders that just don't scale when you get to 100s. This talk explores the methods I've used in the past when throwing large numbers of shaders at multiple platforms. It's certainly not meant as gospel - there's many ways to skin this cat - but it should be a decent start for those bewildered by the options.\n\nBoth these talks are on my [[papers|http://www.eelpi.gotdns.org/papers/papers.html]] page. The shadowbuffer one also has a demo with it - let me know if you have trouble running it, it hasn't had much testing on other machines.\n\nIt was good to get those talks out of the way early on. After that I could focus on a day or two of stuff for the SuperSecretProject, and then relax and be a RadGameTools booth-babe for the rest of the show. And now I need some sleep.
Kinda silly, but the question keeps coming up - which is it? The answer is - both or either, add a space as desired. They're all the same thing, just depends who you ask. I prefer "shadowbuffer", because the word "buffer" implies a certain dynamic quality. Framebuffers and Z-buffers are single-frame entities, and dynamically updated by the hardware, just like shadowthingies. On the other hand vertex, index and constant buffers generally aren't, so it's not a universal quality. Shadowmaps are very much like texture maps - indeed, they're sampled with texture hardware, and there is a "mapping" between what is in them and the visible frame. However, I tend to think of "shadowmaps" as the reverse of "lightmaps" - they are calculated offline and not dynamically generated. Taken all together, I prefer "shadowbuffer".\n\nAs for the rest of the world - well, it's not my fault if they're wrong. But just for information, Google says:\n"shadowbuffer": 1,160 hits.\n"shadow buffer": 2,790,000 hits.\n"shadowmap": 20,900 hits.\n"shadow map": 32,600,000 hits.\n\nSo what do I know?
Phew! That was exhausting. My first Siggraph and it was right in the public eye. But I met a lot of very smart people, and got introduced to a lot of prominent academics - it's fun to be able to meet people after reading all their papers.\n\nThere was a serious amount of "Larrabuzz" around at Siggraph, and a lot of Intel people were being swamped by questions - most of which unfortunately we're not allowed to answer yet. If we were vague on some topics, we all apologize - it's tough remembering what information is not yet public. An added restriction was the length of talks - at Siggraph we had four talks on Larrabee totalling only 90 minutes, and there's only so much information you can pack into that time.\n\nThe main paper was "Larrabee: A ~Many-Core x86 Architecture for Visual Computing" (available from the ACM or your favourite search engine), and was presented by Larry Seiler on Tuesday. It outlined the hardware architecture and some of the reasoning behind the choices we made when designing it. It was extremely exciting to see this as an official Siggraph paper after surviving the whole rigorous process of peer-review and rebuttals. It's gratifying to know that the academic community finds this an interesting and novel architecture, and not just another graphics card.\n\nOn Thursday there were three talks on Larrabee in the "Beyond Programmable Shading" course. I talked about the Larrabee software renderer, Matt Pharr talked about how the architecture opened up some new rendering techniques, and Aaron Lefohn talked about the extension from there into general-purpose computing, including the term "braided parallelism" which I think is an awesome metaphor for the mix of thread and data parallelism that real world code requires. In that same course we had similar views from the AMD and Nvidia side of things, previews of the ~APIs that will be driving them from Microsoft and Apple, and also some fascinating glimpses into the future from Stanford, UC Davis, Dartmouth and id Software (cool demo, Jon!). All of us were under the gun on time limits and everyone had to drop content to fit, so well done to all of the speakers in the session - we managed to keep it on-schedule. [[All the notes are available for download here|http://s08.idav.ucdavis.edu/]] - my notes on the site are slightly longer and more detailed than the ones I spoke to (32 minutes instead of 20), so they're worth looking at even if you went to the talk.\n\nSupplemental: also see the [[Siggraph Asia 2008]] versions of the talks for a different view on things.
Just noticed that the Siggraph Asia 2008 course notes for the "Beyond Programmable Shading" course are [[available for download|http://sa08.idav.ucdavis.edu/]]. At first glance they're a similar structure to the [[Siggraph 2008]] ones, but Tim Foley has some new slants on the material that are pretty interesting. He's highlighted a slightly different take of things, and it might give people some more insights into the nature of The Bee.
It's only pretending to be a wiki.
TomF's Tech Blog
http://www.eelpi.gotdns.org/blog.wiki.html
There's a discussion I keep having with people, and it goes something like this:\n\n"Hey Bob - how's it going at ~UberGameSoft?"\n"Hey Tom - it's going well. I'm figuring out a neat HDR system, Jim's doing a cool physics engine, and Frank's got some awesome post-processing effects."\n"Cool. So you guys must be well into production by now, right?"\n"Kinda, but not really. We still have trouble with the whole export pipeline and there's a lot of pain getting assets into the game."\n\nAt this point, I usually start yelling at people and preparing another ranty blog entry. And thus...\n\nThere's a bunch of stuff that is neat, and it's geekily cool, and we as engineers all love to play with them and they generate some killer GDC talks and suchlike. But they're all unimportant next to the stuff that is necessary to ''ship the game''. We're now looking at team sizes of 50-150 people for 2-4 years, and that's a lot of assets.\n\nYou need the smartest coders in the building to be making sure that everyone else can work efficiently. And that means they're tools coders. Yes, "tools" - don't look at me like I said a dirty word.\n\nThe problem is that at heart a lot of us are still bedroom hackers. And that's a great mentality to have - it keeps us honest. But at those numbers, you have to have some of the dreaded word - "management". But we're smarter than the average bear, so we can use geek management, not [[PHB|http://en.wikipedia.org/wiki/Pointy_Haired_Boss]] management. And it's not management of people so much as management of data. Good people who like the work they do and the people they work with will generally manage themselves pretty well. The problems come when the data flowing between them sets up dependencies. Now obviously that's inevitable - but people can cope with obvious dependencies - like you can't animate a character very well until the rigging is done or whatever.\n\nThe problem is dependencies that aren't as obvious. They're obvious to us coders, because we're the ones who write the systems. And unfortunately the easiest way to write most systems is to have all the source data available at once, throw it in a big post-processing hopper and out comes the DVD image, and then we ship it. Fairly obviously, this is a disaster in practice. So instead we need to think carefully about the chain of dependencies and how we can isolate one person's changes from another, and allow people to work on related objects at the same time without getting stalled.\n\nThis is almost exactly the other big problem coders are facing - parallel processing. And we all know that parallel processing is really really difficult, so we put our smartest coders on the problem. And the same should happen with the tools - it's an even bigger parallel processing challenge, and the problem is not as well-defined as "make the physics go faster" because half the time when you start a game you don't actually know where you're going.\n\nSo I'm going to do some more posts on some of the detailed aspects once I get them straight in my head. But the big message is really that the old days of the hero-coder saving the day with leet gfx skillz are over. Shaders just aren't that difficult to write, HDR is a matter of detailed book-keeping, and shadows may be a pain in the arse but they're not fundamentally going to make or break your game. Good asset management on the other hand can mean the difference between shipping and going bankrupt. That's what the new breed of hero-coder needs to focus on.
Like... in a proper academic paper and everything! This was two years ago, but I only just found it. They took sliding window VIPM and fixed one of the major problems - the poor vertex-cache efficiency. They also tried to improve the memory footprint, but weren't as successful with that. But anyway - real academics writing real papers based on my stuff. Totally cool! I've had citations before in related papers, but never anyone deriving stuff directly and primarily from my work.\n\n[[Topology-driven Progressive Mesh Construction for Hardware-Accelerated Rendering|http://www.graphicon.ru/2004/Proceedings/Technical/2%5B4%5D.pdf]] - Pavlo Turchyn, Sergey Korotov\n\nYeah, OK, so they're from an improbably-named [[Finnish university|http://www.jyu.fi/]] and their paper was published in a Russian journal and it's not even listed on Citeseer. Whataddya want - a personal recommendation from Stephen Hawking? Jeez you people...
I thought I'd better write some notes on my [[GDC 2003 talk|http://www.eelpi.gotdns.org/papers/papers.html]] on SH in games, as in hindsight there were some errors and some items cut for time.\n\nError - slide 5 - "To multiply two ~SHs, multiply their weights". Not true. Although you can add two ~SHs by adding their components, you can't do a modulation of one by the other that way. Doing a componentwise multiplication is a convolution, which is a completely different operation that I really don't fully understand.\n\nSlides 12 and 13 didn't tell you what the magical fConstX values were! Sorry about that. The correct values are found by taking the canonical SH coefficients (for example http://www.sjbrown.co.uk/?article=sharmonics), and then convolve with the clamped cosine kernel (i.e. to actually do the lighting with the standard clamp(N.L) equation we know and love). This is easy - multiply the first value by 1, the next three by 2/3 and the last five by 1/4. Finally, you'll probably want to convert the output from radiance (in some arbitrary units) to a graphics-friendly [0,1] range by scaling everything.\n\nAnd the results are:\nfConst1 = 4/17\nfConst2 = 8/17\nfConst3 = 15/17\nfConst4 = 5/68\nfConst5 = 15/68\n\nEasy! Also note that of course slide 13 - the vertex shader code - can be refactored to change (3*z*z-1) into just z*z by massaging the values held in sharm[7] and sharm[0] back in slide 12. That's left as a fairly simple exercise for the coder - I didn't want to confuse the slides with non-canonical equations.\n\nThanks to [[Peter-Pike Sloan|http://research.microsoft.com/~ppsloan/]] for getting this all straight in my head. He actually //understands// the fundamental maths behind this stuff, whereas I'm more of a ... er ... general concepts man myself (in other words, I suck).\n\nVolker Schoenefeld wrote a good paper on this - see [[More on SH]].\n\nRobin Green also wrote a good talk called "Spherical Harmonic Lighting: The Gritty Details" [[http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf|http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf]]\n\nAnd to complete the little circle of SH incest, a talk by Chris Oat of ATI/AMD at GDC04: [[http://ati.amd.com/developer/gdc/Oat-GDC04-SphericalHarmonicLighting.pdf|http://ati.amd.com/developer/gdc/Oat-GDC04-SphericalHarmonicLighting.pdf]].\n\n\n\nWhat I also had to miss out from the slides were some details of the implementation that I used in the Xbox1, ~PS2 and Gamecube versions of a game that never got finished before [[MuckyFoot]] vanished. This is actually the coolest bit about using SH illumination in real games. So the situation is that we wanted to give the artists a lot of control over their lighting environment. The problem is, what that usually means is they want a lot of lights. That gets expensive - the practical limit on Xbox1 and ~PS2 lights is about three directional lights with specular (on the Xbox you'll get diffuse bumpmapping on one or two of those with a pixel shader), and the Gamecube only has vanilla ~OpenGL lights (which is a lot worse than it sounds - they're not very useful in practice), and they're not fast.\n\nObviously the bar has been raised since - the 360 and ~PS3 can do really quite a lot of vertex and pixel shading, and although the Wii is still the same old Gamecube hardware, it is a bit faster. However, when I say the artists wanted a lot of lights, I don't mean 5, I mean 50. Most of them are fill and mood lights and don't need anything but per-vertex diffuse shading (i.e. not bumpmapping or specular shading), but that's still a lot of lights to ask the vertex shaders to do.\n\nThe solution we came up with was elegant. First, bake all the "mood" lights into the SH. Then of the "focus" lights (that need bumpmapping and specular), pick the brightest four shining on the object. Bake the rest into the SH. Take the brightest four, order them by brightness (1st = brightest, 4th = least bright). The fourth always gets baked into the SH as well. The third one is split into two parts - a real light and an SH fraction - and the brightness of these two always adds up the brightness of the real 3rd light. You split it up by using the relative brightnesses of the 2nd, 3rd and 4th lights so that when the 3rd and 4th lights are almost the same brightness, the 3rd light is all SH. And when the 2nd and 3rd lights are almost the same brightness, the 3rd light is all real, with no SH component. This way, as the object moves around, lights will move from being real lights to being baked into the SH nice and smoothly and you won't see any popping as they change types.\n\nSo that gives you two lights and some fraction of a third that are done properly - bumpmapping, specular, point light (instead of a directional approximation), etc. You can of course use four or five or even two. I found that two looked not that great, and four or five didn't improve matters significantly - three (well, two-and-a-bit) seems to be the the visually good number, but you should experiment yourself.\n\nThat's what we were going to do originally, and that remained the plan for the Xbox1. However, the ~PS2 didn't seem able to vertex-shade even three point lights with proper specular, so that was cut back to just one! Those paying attention will realise that actually that's some-fraction-of-one light - yes, if you were standing in between two equally bright lights, there was no proper lighting performed at all! Still, that's what needed to happen for speed, and it didn't actually look objectionable, just a bit washed-out at times compared to the Xbox1.\n\nThe Gamecube was a challenge - it has no vertex shading hardware at all - so what to do? We tried just feeding the brightest four lights into the standard ~OpenGL lights, and then feeding the zeroth element of the SH into the ambient term (which is something people have been doing for ages - way before SH came along). But it looked rubbish. The artists would put ten yellow lights spread out on the ceiling of a corridor and ten blue lights providing underlighting, and instead of doing the right stark cinematic yellow-opposite-blue lighting, all that would happen is you'd get a random assortment of a soft blue and/or yellow tinge in a random direction, and a muddy grey ambient. Looked awful. So instead the chap doing the GC version (Andy Firth) suggested we take the first four coefficients of the SH (rather than the first nine) and emulate them with OGL lights. So the zero term was still the ambient, the brightest real light was a real OGL light, and then the six cardinal directions (positive and negative x, y, z) each got a directional OGL light shining along that axis. You need six because the OGL lights get clamped on the backside of the light, so if your first coefficient is -0.5, then you need a -0.5 light shining along from the +X axis and a +0.5 light shining along from the -X axis. This was a super neat hack, and it brought the GC lighting quality nearly up to the ~PS2's.\n\nAlthough we weren't exactly //happy// with only having one light and a 4-component SH, the important thing was that the artists now had a reliable lighting model - they could put a bunch of lights in the scene and you'd something pretty close on all three platforms. They accepted that Xbox1>~PS2>GC in terms of quality, and it was //predictable//. Whereas the previous pre-SH lighting schemes would pick a random assortment of lights out of the ones they'd placed, which was very frustrating for them, and heavily limited the sort of "mood" they could set with lights.\n\n\nBut that wasn't the really cool bit. Oh no. The //really// cool bit is that you don't have to just project a directional light into the SH. You're doing this once per object - you've got a few CPU cycles to play with. So why not have some more flexible light types? The simplest tweak, which actually worked really nicely, was to give lights a volume. Normally, lights are done as point lights - infinitely small. Of course you typically set up some sort of falloff with distance to control it a bit better. I loathe the canonical OGL/DX light attenuation - it's really hard for artists to control well. Far better is a linear falloff that goes from a minimum distance (at which the light has maximum brightness, whatever the artist sets that to be) out to a maximum distance (at which the light is ZERO). The maximum distance is also nice for culling - you can actually throw the light away and not worry about it all. That's all pretty easy, and again people have been doing that for ages.\n\nThe problem is that sometimes you want a single bright light that illuminates a large area - let's say it has a brightness of the standard 1.0 at a distance of 10m. But now what happens when an object is 5m from the light? If you set the minimum distance at 10m, that object won't be any brighter, which looks odd. So move the minimum distance in and crank up the brightness. Well, your object is kinda brighter - but the lighting is still odd. Imagine the illuminated object is a sphere. So now a large part of the sphere (maybe a quarter of the surface) is saturating at full brightness, and then there's a falloff, and then still half the sphere is black. In a real room what you'd get from a bright light like that is backscatter illuminating the back side of the sphere, and indeed the shading of the sphere would still be smooth all over without any odd-looking saturation. The other effect is that really a light like that would often be physically larger than the sphere, so it wouldn't just be a directional light - the outer parts of the light would be illuminating around the edges of the sphere. And indeed as you get closer, almost the entire sphere would have some part of the light that would be able to see it directly.\n\nAnyway, so I gave the artists a physical radius in meters that they could set the light to, rather than being a single point in space. Then I do a bit of sensible-seeming maths involving the radius of the object being illuminated, the radius of the light illuminating it, and the distance of the light from the object. This gives me an approximation to how much "wrap around" I should give this light (a small object should get more wraparound than a large one). There's also an "ambient bounce" factor that the artist can set to simulate the light being in a small, white-walled room and causing backscatter as opposed to stuck in the middle of a big open space. But notice this ambient bounce still scales according to the distance between light and object.\n\nAnd then when I go to do the canonical directional-light SH evaluation as given in the slides, the ambient bounce factor just adds light into the sharm[0] ambient factor, and the wrap around factor does that and also shrinks sharm[4] through sharm[8] towards zero. This last one needs some explaining - basically what I'm doing is lerping from the real clamped cosine kernel (as given in the slides) into one that does (N.L*0.5+0.5), i.e. it has a bright spot on one side and a dark spot on the other, and a smooth blend between those, with no clamping needed. This is the fully wrapped around lighting state - when the object is right up against the light source. It should be obvious that a kernel like that is just sharm[0]-[3] = 0.5 and sharm[4]-[8] = 0.0. So lerp between that and the standard directional coefficients using the wrap-around factor.\n\nThe neat bit is that this is also pretty easy to add to the per-pixel bumpmapping stuff (it's a modification of the idea of hemispherical lighting), so you can get wraparound lighting on that as well. You can also tweak the specular lighting so that your highlight size gets larger as you approach the light. This looks really nice - as objects get close to large, bright lights, they still seem to get brighter, even though the brightest-lit pixels are still maxed at full-white. It's pretty neat, and the artists loved having even more knobs to play with on their lights. Sadly, the game (and the company) was shut down before we got to the stage where they could really play with final assets and lighting, so we never found out what the system could do when given some full-time artist loving.
The game I'm most proud of working on. A "peep sim" very much in the vein of Theme Hospital and Dungeon Keeper (not much surprise really - people from both those teams joined Muckyfoot to work on StarTopia, and indeed later worked on Lionhead's "The Movies"). Lots of interesting graphics stuff to do there, and much later I even retrofitted shadowbuffers to it. More [[here|http://www.eelpi.gotdns.org/startopia/startopia.html]].
''WORK IN PROGRESS'' - which is why it's not on the main page yet.\n\nThis is an attempt to gather together some of the knowledge I have about writing streaming resource systems. That is, game engines (rendering, sound, collision, etc) which do little or no bulk-loading of data in "levels", but can instead stream data as the player progresses.\n\nThis is mainly from my experience writing engines at [[MuckyFoot]], firstly with the game [[Blade II]] which shipped on ~PS2 and Xbox1, and then the engine used for the next two projects, neither of which were actually shipped, but were far enough along in production to pound out the problems with the system. It also contains the results of discussions with others about their streaming systems, and the problems they had with them. I suspect the success of my system has more to do with luck than planning, but the lessons are nevertheless useful!\n\nSome resource types, such as textures and to a lesser extent meshes were the space-hogs, and there were sophisticated systems to make sure we only loaded exactly what we needed at the LOD we needed.\n\n\n_Resource framework_\n\nResources + priorities + LOD levels\nProd & decay\nDecay done with a timestamp & logarithmic decay. Preserves the idea of "twice as important as". Timestamp allows decay without constant checks.\nPriority queue\nFetch queue - 20 items long - allows optimisation of seek times. Once something is in fetch queue, it is loaded\nEvict queue - can't remove items immediately (GPU may still be using them).\nAvoiding constant PQ updates - stochastic updates instead.\nFragmentation\nDefragmentation\nChanging up a LOD level\nChanging down a LOD level\n\n_Setting priorities_\n\nScale by distance\nWalk through visible portals\nWalk through closed but unlocked doors\nWalk through locked doors with lower priority\nFlood-fill X portals after visible frustum (corridor problem)\nBreaking objects (suddenly need new textures, etc)\n\n_Cinematics_\n\nRecord & play back 5 seconds ahead\nAt decision points, have scripts add "future cameras" that are treated like real cameras with slightly lower priority.\n\n_Textures_\n\nJust-too-late\nTinytextures\nConsole mipmap streaming\nPC mipmap streaming\n\n_Sound_\n\nNo traditional LOD like meshes & textures.\nSpot effects - JTL - if not there, not played.\nBarks - JTL - if not there, played when loaded, unless later than 1 second.\nLonger music & speech - traditional buffered streaming with max-out priority bump at half-empty\n\n_Animations_\n\nIn Blade II, these didn't take that much memory, so were not a sophisticated system.\nDidn't have any LODs in our engine, but some libraries (e.g. Granny) do, so use them like texture LODs.\n\n_Meshes_\n\nPS2 did not have VIPM (too complex for available engineering time), so were either present or not.\nXbox1 used VIPM LOD with bumpmapping.\nAlmost every mesh in game had VIPM chain - most were automatically generated.\nFor simplicity, VIPM resources only had three LODs. Example for a character was min=150 tris, normal = 5k tris (same as PS2 version), max = 50k.\nNo equivalent of "tinytexture" that is always resident - meshes can't go much below 150 tris, and that's still too big (~6kbytes each)\nActual Xbox rendering was continuous LOD. No popping, finer granularity saved vertex throughput as well as for streaming.\nMeshes cause texture prods - they know what mipmap levels they need. Textures inherit modified priority of mesh.\n\n_Game objects_\n\nProd meshes with priorities.\nProd animations with sounds.\nPriority bump for player, enemy, important objects etc. Scenery and so on considered at normal priority.\nArtificial prod for alternative meshes (e.g. broken versions, "used" versions) when player nearby.\nProds for equippable objects, e.g. weapons, so when you change, you don't have empty hands for a second.\nProds for animations.\n\n_Collision objects & AIs_\n\nCollision objects have high priority. Range-bubble dependence from player - no portals.\nAIs also range-dependence. If they find a missing collision object, they freeze & request it with even higher priority.\nCertain important AIs have their own range bubbles.\nScripts can raise priorities, give certain AIs larger range bubbles, etc\nKeep raising priorities until there are no AI freezes. This data is usually small, and over-loading is usually not a problem.\n\n_Refinements_\n\nBetter PQ implementation. Maybe use buckets & only finely-sort the top bucket?\nMaybe try a hardware-like L1/L2 cache? L1 = resident, L2 = waiting to be resident.\nSeparating LOD from priority. Ideal LOD = normal priority, next one down = higher priority, next one up = low priority.\nMaybe always have two "objects" in LODdable objects - the one that is loaded, and the next higher one.\nAI LODs - use corser collision geometry, or have pre-authored movement tracks that they revert to (e.g. street pedestrians)\nTry low-rez sounds - maybe far-off sounds are fine to play back at higher compression levels? Drop to 22kHz, 11kHz, or more compressed MP3/Ogg levels?\nHave 1st second of sound in memory, as soon at starts playing, stream the rest? Many sounds <1second long - doesn't help them.\nFor sounds with multiple variants (e.g. footsteps, gunfire, etc) only have a few permanently loaded, and when they start playing, load the rest.\nSame for animations, e.g. 10 different walk anims. Or use generic walk anim until specific depressed-wounded-troll-with-limp anim has been loaded.
While I'm having a rant - strippers. STOP IT. Stop writing academic papers about generating the ultimate strips. It's all totally pointless - pretty much every bit of hardware has indexed primitives (except the PS2, which has a bunch of completely bonkers rules that are ''nothing'' like standard strips anyway). The ultimate stripper will get you one vertex per triangle. But even a very quick and dirty indexer will get you that, and good indexer (e.g. the one I put in [[Granny3D]]) will get close to 0.65 vertices per triangle for most meshes with a 16-entry FIFO vertex cache. The theoretical limit for a regular triangulated plane with an infinitely large vertex cache is 0.5 verts/tri (think of a regular 2D grid - there's twice as many triangles as vertices), so thats not too shabby.\n\nNow, it is true that once you have chosen your triangle ordering for optimal vertex-cache efficiency, you may then want to express them as indexed triangle strips. But the key is in the first word - //indexed//. You've gone to all that trouble to order your tris well - don't change that order! Given that order, you can still rotate the triangles to express things as short strips, especially when you have a strip-restart index (e.g. the newer consoles and DX10), and that is pretty efficient. Since vertices are so much larger than indices, and re-transforming them if they drop out of the cache is so expensive, it's worth spending a few extra indices to keep the vertex cache toasty warm. But generating those optimal strips is pathetically simple - precisely because you can't change the ordering of the triangles.\n\nAll those academics still writing papers on this - go find something more useful to spend your time on. We've had indexed primitive hardware for <$50 for almost a decade now. Welcome to the 21st century.
Now only secret, not //super// secret - it's [[Larrabee]].
When streaming textures from disk (see entries tagged as "Streaming"), there's no reason you must have the same format on disk as you do in memory. You're limited by the speed of the drive, so you can do some decompression there. This reduces disk footprint, which is nice, but it also reduces the distance the head has to seek between resources (because they take less disk space), which is very good. Obvious things to do are Zip-style lossless compression, but ~DXTn doesn't compress that well. So you'd want some slightly lossy compression (because hey - ~DXTn is pretty lossy already - what's a bit extra?), but people are having a tough time finding ways to start with a ~DXTn image and add further lossiness - ~DXTn's sort of compression is really unfriendly to that sort of stuff.\n\nSo the other way to do it is to start with the original image and use proper image-compression methods like JPEG, PNG, Bink, etc. Those get great compression ratios. The problem is that when you load them off disk, they decompress to something like 888, which chews video memory and can slow down rendering because it takes valuable bandwidth. We really want to compress to something smaller. It would also be very useful to be able to do procedural operations either on the CPU or GPU producing 888 data, and then compress the result.\n\nBut good ~DXTn compression is notoriously slow - the problem is searching for the two endpoint colours and the interpolation through 3D colour space - it becomes a really big searching problem. There have been some interesting technqiues - [["Real-Time YCoCg-DXT Compression" by J.M.P. van Waveren and Ignacio Castaño|http://developer.nvidia.com/object/real-time-ycocg-dxt-compression.html]] is a good round up, and also a nice exploration of some alternative encodings (store the Y channel in the alpha of ~DXT5, which de-correlates it from the the ~CoCg part which increases quality and speeds up searches).\n\nIt's a neat trick, but I went the other way. ~DXTn-style compression of 2D and 3D colour spaces is slow because the search space is large. But compressing 1D data has a far smaller search space and is much quicker. We can either use ~DXT1 with greyscale data, or in ~DX10 there is a ~BC4 format that stores single-channel data - it's basically just the alpha-channel of ~DXT5, without the colour data, so it's also 4 bits per texel.\n\nThe core of the idea is to take the image, shrink it by 4x in each direction, quantise it, and store it in a 565 format texture. Then scale that back up with bilinear filtering, and find the luminance of both it and the original image (i.e. greyscale them). Divide one luminance by the other - now you have a bunch of values that corrects the luminance of the low-rez 565 texture. Find a texture-wide scale&bias to let you store these luminances well - most of them are around 1.0, and we're only interested in the deviations from 1.0. Then store those as a full-size ~DXT1 or ~BC4.\n\nThe decompression shader code is very simple - sample both textures with the same U,V coords, scale&bias the luminance correction with globals fed in through pixel shader constants, and multiply with the result from the 565 channel.\n\nThe quality compared to standard ~DXT1 is pretty good. It takes 5 bits per texel (4 for the luminance correction channel, and 1 for the 565 texture, because it's shrunk by 16x). This is slightly larger than ~DXT1, but the visual quality seems to be very close, though I haven't done any PSNR studies. The average quality seems comparable, and the main difference is in highly-coloured areas. Here, ~DXT1 gives its distinctive block-edge errors, which are pretty objectionable. Using this technique instead blurs the colours out - essentially, you get the raw upsampled 565, with the luminance correction not doing all that much. While not significantly more correct than ~DXT1, it does not have any visible blocking, and so attracts the eye far less.\n\nI have also tried only shrinking the 565 by 2x in each direction. This obviously gives better fidelity, but increases the cost from 5 bits/texel to 8 bits/texel.\n\nIn the future I'd like to try converting to a luminance+chrominance space (e.g. ~YCbCr or ~YCoCg) and then just storing the luminance raw in the ~DXT1, and the chrominance in one half of a 4444 texture (i.e. either the RG or the BA channels). That way you can share the 4444 between two different textures and amortise the cost. It brings the 4x4 downsample to 4.5 bits per texel, or the 2x2 downsample to 6 bits, both of which are pretty acceptable. I'm a little curious about the quantisation there - is 4 bits enough for chrominance? You don't see quantisation problems on the current 565 texture because the luminance scaling corrects for it, but this would be a somewhat different colourspace.\n\nI'd also like to do something smarter with the ~DXT1 texture. Currently, I just store the luminance as RGB like you'd expect. But ~DXT1 has 565 end points, so you don't get true greys. Also, you could do some clever maths with dot-products to get higher precision. However, the thing to remember is that we do a scale&bias on this value - most of the errors are very close to 1.0. So in fact I suspect any extra precision and non-greyscale effects are insignificant in practice.\n\nThis also makes me wish we had some texture formats that were less than 4 bits per texel, even at the cost of accuracy. Those would be ideal for storing the luminance correction term. I guess you could use 332 to store three different luminance channels (2.7 bits per texel), but the logistics of sharing textures gets tricky.\n\nYou could also pack three luminance channels into a ~DXT1 - the problem is then you have to compress 3D data into ~DXT1, and avoiding that was the whole point of this exercise! However, if you didn't mind the higher compression cost, it might work quite well. The thing to realise is that most 4x4 blocks of luminance correction have very little data - most of the time, the upsampled 565 is very close to correct. It's only a minority of blocks that have correction data. So if you choose your three textures well (an exercise left to the reader :-), you can end up with very few blocks that have data in more than one of the 3 channels, and so still get good quality. Cost would be very small - it's 4 bits per texel ~DXT1, shared between 3 textures, and then 1 bits per texel for 4x4 shrink 565, for a total of 2.7 bits per texel!
Louise Forsyth (nee Rutter) is my //darling// wife (dear, please stop peering over my shoulder). We're coming up to our 12th anniversary together, and just past our 2nd year married. Living in sin was such fun, but it's all over now. Conventional at last. She's meant to be a vet, but the US authoritays don't realise that UK cats and dogs are uncannily similar to US cats and dogs and want her to pass some more exams. When my mum told me that a Cambridge degree was the key that unlocked the world, I believed her. They fuck you up, your mum and dad :-)\n\nShe has a [[hiking blog|http://eelpi.livejournal.com/]] which has some gorgeous pictures of the surrounding countryside. Volcanos rock. Quite literally.
That's me - Tom Forsyth. I'm now pretty much permanently TomF though - even when I write emails to people like my parents or my wife. It started at MuckyFoot where I was the second Tom to join, and Tom Ireland had already nabbed tom@muckyfoot.com, so I had to make do with tomf@muckyfoot.com. However, Tom Ireland would get a constant stream of mail meant for me, so I started to make a point of always signing myself TomF in the hope that people would remember to add the "F". It only partly worked, but the habit's kinda stuck now.\n\nThe main URL for my site is http://www.eelpi.gotdns.org/ When you go there, it redirects to whatever ISP I am paying money to this year, so don't use whatever address you see in your browser bar - stick to this one, coz it will move around as I do.\n\nMore:\n* EmailMe.\n* My surname has NoE.\n* Where I work: [[Intel|http://www.intel.com/]].\n* What I work on: [[Larrabee|http://www.google.com/search?q=larrabee+intel]] (remember - two Rs, one B)\n\nDisambiguation:\n* The Tom Forsyth who is on various game-related programming mailing ''is'' me.\n* The Tom Forsyth who is is a Microsoft MVP ''is'' me.\n* The Tom Forsyth who has talked at GDC ''is'' me.\n* The Tom Forsyth who has written and edited programming book articles ''is'' me.\n* The Tom Forsyth who worked at [[MuckyFoot]] and [[RadGameTools]] ''is'' me.\n* The Tom Forsyth who works/worked for Nokia is ''not'' me.\n* The [[Tom Forsythe|http://www.tomforsythe.com/]] who takes photos of naked Barbie dolls is ''not'' me.\n* The [[Tom Forsyth|http://www.rangers.premiumtv.co.uk/page/HallOfFame/0,,5~529637,00.html]] who scored the winning goal in the 1973 Scottish Cup Final is obviously ''not'' me - I was busy being born that year. He also nabbed the [[Wikipedia entry|http://en.wikipedia.org/wiki/Tom_Forsyth]] - I really should have moved a bit quicker in anticipation of my eventual superstardom.\n* I am ''not'' significantly related to these or any other famous Forsyth or Forsythe (Frederick, Bruce, John, etc).\n\n[[Googlism|http://www.googlism.com/index.htm?ism=Tom+Forsyth&type=1]] says:\n* "tom forsyth is the manager of the rf test group at palm" - ''NO''\n* "tom forsyth is a bit of a visionary when it comes to lod" - ''YES''\n* "tom forsyth is founder of the eigg trust and a crofter at scoraig in wester ross" - ''NO''\n* "tom forsyth is a super" - ''NO''\n* "tom forsyth is one of the main guys behind startopia and also a frequent poster to the game dev algorithms mailing list" - ''YES''\n* "tom forsyth is on the board of the tmf which demonstrates our commitment to ngoss" - ''NO''\n* "tom forsyth is just" - ''YES''
AKA TomF. http://www.eelpi.gotdns.org/
I was talking to someone about simple in-game lighting models the other day and we realised we had totally different terminology for a bunch of lighting tricks that we'd each independently invented. With my instinctive loathing for reinventing the wheel, I wondered why this had happened. And the reason is that some lighting tricks are so simple they don't warrant a proper paper or a book article or a GDC talk, so they never get passed along to others. So I thought I'd write up a light type I've been using for a while that I call a "trilight" that I've not seen anyone else talk about. It's very simple, and I'm sure others have invented it too, but at least now I'm happy I've documented it. It's on my [[papers|http://www.eelpi.gotdns.org/papers/papers.html]] page, and it's got a simple demo to go with it.\n\nBy the way, it's really satisfying to write a small demo and a short paper and do it all start to finish in half a day for a change. Normally I tackle big problems like shadowing, which take months to code and lots of explanation. Excellent way to spend a Sunday afternoon - more people should try it.
Supposedly from the very famous "Four Yorkshiremen Sketch" from Monty Python. Reproduced here in its entirety just because it makes for easier cut'n'pasting. TheWife will kill you if you describe her as a Yorkshireman, because Yorkshire is on the ''OTHER SIDE OF THE FUCKING HILL''. She, as any well-brought-up midlander will be able to instantly tell from her authentic Estuary/Fenland accent, is from Lancashire. But that's an amusing anecdote for another time. Meanwhile, pop the stack and on with the primary anecdote:\n\nThe Players:\n Michael Palin - First Yorkshireman;\n Graham Chapman - Second Yorkshireman;\n Terry Jones - Third Yorkshireman;\n Eric Idle - Fourth Yorkshireman;\n\nThe Scene:\n Four well-dressed men are sitting together at a vacation resort.\n 'Farewell to Thee' is played in the background on Hawaiian guitar. \n\nFIRST YORKSHIREMAN:\n Aye, very passable, that, very passable bit of risotto.\nSECOND YORKSHIREMAN:\n Nothing like a good glass of Château de Chasselas, eh, Josiah?\nTHIRD YORKSHIREMAN:\n You're right there, Obadiah.\nFOURTH YORKSHIREMAN:\n Who'd have thought thirty year ago we'd all be sittin' here drinking Château de Chasselas, eh?\nFIRST YORKSHIREMAN:\n In them days we was glad to have the price of a cup o' tea.\nSECOND YORKSHIREMAN:\n A cup o' cold tea.\nFOURTH YORKSHIREMAN:\n Without milk or sugar.\nTHIRD YORKSHIREMAN:\n Or tea.\nFIRST YORKSHIREMAN:\n In a cracked cup, an' all.\nFOURTH YORKSHIREMAN:\n Oh, we never had a cup. We used to have to drink out of a rolled up newspaper.\nSECOND YORKSHIREMAN:\n The best we could manage was to suck on a piece of damp cloth.\nTHIRD YORKSHIREMAN:\n But you know, we were happy in those days, though we were poor.\nFIRST YORKSHIREMAN:\n Because we were poor. My old Dad used to say to me, "Money doesn't buy you happiness, son".\nFOURTH YORKSHIREMAN:\n Aye, 'e was right.\nFIRST YORKSHIREMAN:\n Aye, 'e was.\nFOURTH YORKSHIREMAN:\n I was happier then and I had nothin'. We used to live in this tiny old house with great big holes in the roof.\nSECOND YORKSHIREMAN:\n House! You were lucky to live in a house! We used to live in one room, all twenty-six of us, no furniture, 'alf the floor was missing, and we were all 'uddled together in one corner for fear of falling.\nTHIRD YORKSHIREMAN:\n Eh, you were lucky to have a room! We used to have to live in t' corridor!\nFIRST YORKSHIREMAN:\n Oh, we used to dream of livin' in a corridor! Would ha' been a palace to us. We used to live in an old water tank on a rubbish tip. We got woke up every morning by having a load of rotting fish dumped all over us! House? Huh.\nFOURTH YORKSHIREMAN:\n Well, when I say 'house' it was only a hole in the ground covered by a sheet of tarpaulin, but it was a house to us.\nSECOND YORKSHIREMAN:\n We were evicted from our 'ole in the ground; we 'ad to go and live in a lake.\nTHIRD YORKSHIREMAN:\n You were lucky to have a lake! There were a hundred and fifty of us living in t' shoebox in t' middle o' road.\nFIRST YORKSHIREMAN:\n Cardboard box?\nTHIRD YORKSHIREMAN:\n Aye.\nFIRST YORKSHIREMAN:\n You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t' mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi' his belt.\nSECOND YORKSHIREMAN:\n Luxury. We used to have to get out of the lake at six o'clock in the morning, clean the lake, eat a handful of 'ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!\nTHIRD YORKSHIREMAN:\n Well, of course, we had it tough. We used to 'ave to get up out of shoebox at twelve o'clock at night and lick road clean wit' tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit' bread knife.\nFOURTH YORKSHIREMAN:\n Right. I had to get up in the morning at ten o'clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.\nFIRST YORKSHIREMAN:\n And you try and tell the young people of today that ..... they won't believe you.\nALL:\n They won't!\n\n\n[[Geek culture|http://apple.slashdot.org/comments.pl?sid=138720&cid=11609478]] - fantastic!\n\nThe attentive will have noticed that the phrase "uphill both ways" does not in fact appear in this sketch. That's because a fairly cursory bit of research will reveal that it is from a fairly similar but entirely unrelated Bill Cosby sketch. Bill Cosby is not a native Yorkshireman. Not a lot of people know that.
Released on PC, ~PS1, Dreamcast. I joined [[MuckyFoot]] just before the PC version was finished, so I didn't have much to do with the game itself, but it's got a lot of neat stuff in it - I enjoyed playing it. I did the Dreamcast port after the PC version had shipped - just me and Karl Zielinski testing in nine months start to finish. The Dreamcast was a really nice machine to work with - it just did what it was meant to without any fuss - a big contrast to the drama the PS1 version was going through. It's a shame the DC died - it was a nice elegant bit of kit.
The [[Utah Teapotahedron|http://www.ics.uci.edu/~arvo/images.html]] is an awesome demo model. It's easily recognisable, so it's easy to tell when you've screwed up a scaling or are clamping N.L the wrong way or managed to get your backface culling wrong. It's got insta-create functions in 3DSMax and D3DX, and the spline data is easily available (see [[Steve Baker's excellent homage page|http://www.sjbaker.org/teapot/]]). It has areas of all types of curvature, it can project shadows onto itself really nicely (handle, spout, lid handle). And it's easy to throw a texture map onto it - apart from the pole on the lid, it's easy to parameterise.\n\nAlso, it helps a pet gripe of mine - algorithms that require non-intersecting watertight manifolds. The teapot is not closed and it self-intersects (the handle punches through the side). So it's really useful for pounding on algorithms that don't like either of those. Getting artists to generate watertight manifolds is a horrible job - please could all the researchers out there stop developing algorithms that require it? If your algorithm doesn't even handle the sixth platonic solid, it's not much use to anyone, is it?\n\nFrankly, the only reason [[Lenna|http://www.lenna.org/]] is a better icon for computer graphics is that she's been around longer, and she's naked.
I keep having this stupid conversation with people. It goes a little something like this:\n\nThem: I need to get a new card - one with dual-DVI. Any ideas?\nMe: You really need dual DVI?\nThem: Yeah, VGA's shit.\nMe: VGA's actually pretty good you know - are you running at some sort of crazy res?\nThem: Yeah - 1600x1200 - it's a blurry mess.\nMe: That's not crazy at all. It's well within VGA's capabilities.\nThem: No, analogue is crap for anything like that. You have to go digital. Especially if you're reading text.\nMe: Dude, at home I run an Iiyama 21-inch CRT at 1600x1200 at 85Hz on a VGA cable. I can write code all day with black text on white backgrounds. At work I run my second Samsung 213T off a VGA cable as well, and that's the screen I use for email - black on white again. They're both crystal-sharp.\nThem: Rubbish. I just tried it myself - it's an unreadable blurry mess at even 60Hz.\nMe: Are you by any chance ... using an nVidia graphics card?\nThem: Sure, but what's that go to do with it? Have you got some ninja bastard card?\nMe: An elderly and perfectly standard Radeon 9700.\nThem: I've got a 7800 GT - it should kick the shit out of that.\nMe: Yes, at shuffling pixels. But it's got an nVidia RAMDAC. Which is a large steaming pile of poo.\n\nSeriously - what the hell is up with nVidia and their RAMDACs? They've been shit right from day one in the NV1, they were shit when I worked at 3Dlabs and the $50 Permedia2 had an infinitely superior display quality to their top-end GeForce2s, and they've continued that grand tradition right up to current cards. That was acceptable when a RivaTNT cost $50, but now they're asking $1000 for an SLI rig. My boss was trying to get two monitors hooked up to a fancy nVidia card that only had one DVI port on it, and whichever monitor he plugged into the VGA port was ghosting like crazy. Swap the card out for a cheap no-frills Radeon X300 and hey - lovely picture on both.\n\nNow you're going to think I'm an ATI fanboi. And I am, because I like elegant orthogonal hardware. But I'm not syaing ATI RAMDACs are great - it's just that they don't suck. Matrox, 3Dlabs and Intel all have decent RAMDACs. Even the S3 and PowerVR zombies have better RAMDACs. Beaten by S3! That's absurd.\n\nFor example, a colleague had a high-spec "desktop replacement" laptop with an nVidia chipset of some sort that he could never get to drive his cheap CRT with half-readable text. Naturally he blamed the monitor. He's recently replaced it with a new Dell 700m, and it drives the CRT wonderfully. This is a $5 Intel graphics chip in a laptop! It's totally worthless as a 3D card, but even it does orders of magnitude better than the nVidia cards at running a CRT.\n\nThe one time nVidia cards have decent RAMDACs is when it's by someone else. Some of their "multi media" cards with the fancy TV in/out stuff have a nice external RAMDAC made by someone else, and apparently (never tried them myself) they work just fine. I'm all for new tech, but we've all been bounced into switching to DVI has for such bogus reasons - monitor sizes just aren't growing that fast.\n\nSo if someone tells you that old steam-powered analogue VGA is totally obsolete because DVI quality is just sooooo much better, ask them if they've got an nVidia graphics card.
I've added a version of my VIPM comparison article to my website in the [[papers section|http://www.eelpi.gotdns.org/papers/papers.html]] down the bottom. It's always nice to have articles "alive" and searchable by engines. I added some hindsights about the process of integrating VIPM into Blade, since the article was written before that had been done.
Woohoo! Finally got off my arse and wrote a paper without some editor nagging me that it was two weeks overdue!\n\nIn my day job doing [[Granny3D]], we have a mesh preprocessor, and I was looking for a vertex-cache optimiser to add to it. But the only ones I knew of were either too complex for me to understand, too special-case, or proprietary. I'd written a vertex-cache optimiser thing with lookahead before, and it was mind-numbingly slow. Truly, awfully slow. Even with a lookahead of only six triangles, it was pathetically slow (I tried seven - it didn't finish after three hours running time). So I thought well, I'm not doing one of those again.\n\nSo I tried just writing a no-lookahead, no-backtrack one, because anything is better than [[a slap in the face with an angry herring|Strippers]], and I basically made up some heuristics that I thought would work. And it behaved scarily well. So I tweaked it a bit, and it went better. And then I wrote a thing to anneal the "magic constants", and it went really really well. So well, it got within ten percent of the optimal result. Then I compared it to the variety of scary (but excellent) papers I'd read and it was really close to them. I haven't done an apples-to-apples comparison yet, but the eyeballed mammoths-to-heffalumps comparo looked rather promising. And it's orders of magnitude faster, and way easier to understand. And then [[Eric Haines|http://www.acm.org/tog/editors/erich/]] asked a question about this on the ~GDAlgo list, and that was too much for me to stand, so I wrote the thing up and put it [[here|http://www.eelpi.gotdns.org/papers/fast_vert_cache_opt.html]]. I intend to keep it relatively "live", so if you have any feedback, fire away.
It's a Wiki. Sort of. A Wiki is a collaborative document with really simple editing and hyperlinking so that even idiots like me can create documents with lots of confusing links to random pages. The simplest way to make a new link is to WriteInCamelCapsLikeThis, which is why some of the words seem really stupid. Just learn to live with it - the less time I have to spend thinking about formatting, the more likely I am to deign to write down my peals of wisdom, and the more likely you are to be raised to enlightenment. You lucky people.\n\nQuick guide:\n* click on blue links to open new topics (technically called "tiddlers").\n* hover over open topics to get a little toolbar in the top right with a list of actions.\n* click on "close" to close the current topic.\n* click on "close others" to close everything but this topic.\n* some pages have tags which categorise them. This one has the tag "help". You can select the "Tags" field on the far right and see a list of all the tags, and browse the topics that use those tags. Handy if all you want is tech stuff and not a lot of the random ramblings that go with them.\n* experiment - there's no way you can harm this Wiki by playing - even if it looks like you can (because you can't save your changes).\n\nNote it's only sort-of a Wiki, because although it looks like you can edit it, you can't save your edits. It lives on my server, so only I can edit it, so ner! Obviously you can save them to your hard-drive, but that's unlikely to be that interesting - nobody else will see them. If you want to play with real Wikis, there's the original WikiWikiWeb (http://c2.com/cgi/wiki) from which all are Wikis ultimately descended, and probably the most famous is the Wikipedia (http://en.wikipedia.org/wiki/Main_Page), which is an Encyclopedia that is a Wiki - created for geeks by geeks, and thus contains far more about about Pokemon characters (http://en.wikipedia.org/wiki/Jigglypuff) than it does on ball gowns (http://en.wikipedia.org/wiki/Ballgown).\n\nFor the technically-minded, this particular breed of Wiki is a TiddlyWiki, and you can read all about it at http://www.tiddlywiki.com/
I didn't say the pages contained anything useful. Move your mouse over this text - see how there's a line of text starting with "close" to the upper left? Click the word.