Orion_ Posted May 11, 2006 Report Share Posted May 11, 2006 This is a little bad attempt to Fixed Point Raytracing on Jaguar. The Raytracer is 100% GPU Code. (836bytes) I know this is not really looking like a raytracer, but actually it is the bad colors and precision is due to the fixed point :/ it's not optimised, it was only a little technical test to see how fast this would be on GPU ^^ This was coded on emulator Project Tempest, but it also work on a real jaguar except that it seems there is some glitchs with the jagware logo :/ anyway, enjoy (promise, I will try to start making some more useful stuff from now ^^) Jag Raytracer Quote Link to comment Share on other sites More sharing options...
GT Turbo Posted May 11, 2006 Report Share Posted May 11, 2006 (promise, I will try to start making some more useful stuff from now ^^) Nice attempt Orion, we're waiting for the next thing GT Quote Link to comment Share on other sites More sharing options...
Zerosquare Posted May 11, 2006 Report Share Posted May 11, 2006 Lovely Quote Link to comment Share on other sites More sharing options...
Symmetry of TNG Posted May 13, 2006 Report Share Posted May 13, 2006 Cool! =) I have no idea how it actually works, but I can't imagine it is what I assume "raytracing" to be (ie allot of vector algebra, crossproducts,dotproducts, n stuff) ...since i assume that code to be bigger than 836 bytes there must be some simplification involved... or? Is that 3 shaded spheres ? ...or just one? (the green one?) ..i mean regarding fixpoint, why does the red&blue look smoother shaded than the greenone?... wouldnt the math be the same?... are there some Z values involved?..and how do you calculate the shadevalues? Just qurious is it .8 or .16 fixpoint? ... (or signed 7.8 or something).... And sorry for beeing "nosy" but Im just qurious if you use vectormath or not.. and if that is the precision you get with the fixpoint?... ..Its nice work Orion! =) some thoughts: -Could the DSP do half of the screen? ... (ie ~2 faster... not counting the 16bit bus to dsp, hence the "~") -Do you storew pixels to screen space with external stores? A good idea would be to let the gpu build, say 1 scanline of the screen in GPU memory, and then Phraseblitt a hole scanline to screenspace... killing all external stores! Combine this with the 1st point.. letting the dsp do same thing but other half of screen... -Another gpu thingy is to do 2pixels at the same time... (dont exactly know how the inner loop looks but..) usually you can optimize it and interleave 2 similar calculations, and build 2 pixels at the same time, and make 1 store to GPU memory (storing two 16bit pixels at the same time).. and then after a scanline do the phrblitt to screen... (this might increase part of the codesize by *2.. (®ister usage) since you have to do same thing twice... and you might have to do special "pre-loop" setups.. to get it working... (common on the falcons dsp) but you kill all waitstates that you might have in the code.. and half the loopcount.... making use of true RISC power.. (1tick/instruction) >I will try to start making some more useful stuff from now ^^) Ohh... it was that then But this could be "usefull!" ... ...in my eyes it is defenitely NOT a waste of time anyway.. cheers Quote Link to comment Share on other sites More sharing options...
Orion_ Posted May 13, 2006 Author Report Share Posted May 13, 2006 I have no idea how it actually works, but I can't imagine it is what I assume "raytracing" to be (ie allot of vector algebra, crossproducts,dotproducts, n stuff) ...since i assume that code to be bigger than 836 bytes there must be some simplification involved... or? it is real raytracing, only one light, and 3 shaded sphere, no rotation, but it use a lots of dotproducts, 2 squareroot, some div, and that for each pixel, the GPU is quite fast I was impressed, but the fixed point make this fast too. I can release the source code If you want but it's not really a clean source code, and it have some dirty last minute hack because I had problem with some distance comparison. Is that 3 shaded spheres ? ...or just one? (the green one?) ..i mean regarding fixpoint, why does the red&blue look smoother shaded than the greenone?... wouldnt the math be the same?... are there some Z values involved?..and how do you calculate the shadevalues? The green one is closer to the camera than the 2 others, and I think because of the fixed point and lost of precision, the shaded colors are ugly :/ the shadevalue is calculated doing crossproduct between the light vector (from the light to the intersected point on the sphere) and the normal at the intersected point of the sphere surface. is it .8 or .16 fixpoint? ... (or signed 7.8 or something).... signed .8 (else I think it will overflow) -Could the DSP do half of the screen? ... (ie ~2 faster... not counting the 16bit bus to dsp, hence the "~") -Do you storew pixels to screen space with external stores? A good idea would be to let the gpu build, say 1 scanline of the screen in GPU memory, and then Phraseblitt a hole scanline to screenspace... killing all external stores! Combine this with the 1st point.. letting the dsp do same thing but other half of screen... (I use external storew) actually I was thinking of doing that before starting the raytracer, that's why I tried a simple test with the GPU to predict how fast It will be using those optimisation, even by reducing the screen to 160x120 I think I will not do better than 2fps, I don't know if that be worth it. Quote Link to comment Share on other sites More sharing options...
Symmetry of TNG Posted May 14, 2006 Report Share Posted May 14, 2006 so its raytracing, nice! =) well source are always nice, and Im interested in the topic, and I could do a quick check if I notice some obvious optimisation possebilities. So if you whant to, then please do so =) (I have some PC c source for this that i planed to dig into & convert.. but i never got to that.. >shadevalue is calculated doing crossproduct you mean DotProduct? ... ..but still... you mean you do a cross product at each pixel of the sphere to find the normal & then do a light*normal to get the shade.. thats still ALLOT of work.. (and it is the true "algebra way".... then it isnt that bad speedwise =) >signed .8 (else I think it will overflow) well it might still overflow of you do MAC's ..but with s7.8 (signed 7integer 8fixpoint) you get the correct sign with the built in mult instructions.. If you go for s15.16 then you get higher precision but you ned allot of more work >I use external storew Well if there is nothing else on the bus then perhapps.. but in worst case 1 storew will take the time it takes for the OP to do all objects in the objectlist!... since the OP hogs the bus while it is doing the OL, then storing internally is a much better way since it can always do that independently of what the other system is doing. I did this when i did the maniac optimisation of my fire routine... first version storew'd ..and it becomes much much faster to build a scanline internally & phrblitt. I noticed this even more with the "water" routine i made... (aarrgghhh!) ended up with a circular scanline buffer that was phrase blitted in & out, and calculated 2pixels at a time with half the loopcount.... and that was extremely much faster than doing 5 or 6 external mem accesses for each pixel... but it becomes a heck of alot more complex... ahh well... Its still nice work you did! =) And it IS usfull!... i can imagine a 96K TYS demo and that could be a "gfx renderer" instead of storing the bitmap ..so.. chers! Quote Link to comment Share on other sites More sharing options...
Recommended Posts