Jump to content
Jagware
Sign in to follow this  
Orion_

Jag Raytracer

Recommended Posts

Orion_    1

This is a little bad attempt to Fixed Point Raytracing on Jaguar.

The Raytracer is 100% GPU Code. (836bytes)

 

I know this is not really looking like a raytracer, but actually it is :)

the bad colors and precision is due to the fixed point :/

it's not optimised, it was only a little technical test to see how fast this would be on GPU ^^

This was coded on emulator Project Tempest, but it also work on a real jaguar except that it seems there is some glitchs with the jagware logo :/

 

anyway, enjoy :)

 

(promise, I will try to start making some more useful stuff from now ^^)

 

Jag Raytracer

 

raytracer.png

Share this post


Link to post
Share on other sites
GT Turbo    3
(promise, I will try to start making some more useful stuff from now ^^)

 

Nice attempt Orion, we're waiting for the next thing :flowers:

 

 

GT :poulpe:

Share this post


Link to post
Share on other sites

Cool! =)

 

I have no idea how it actually works, but I can't imagine it is what I assume "raytracing" to be (ie allot of vector algebra, crossproducts,dotproducts, n stuff) ...since i assume that code to be bigger than 836 bytes ;)

there must be some simplification involved... or?

 

Is that 3 shaded spheres ? ...or just one? (the green one?) ..i mean regarding fixpoint, why does the red&blue look smoother shaded than the greenone?... wouldnt the math be the same?...

are there some Z values involved?..and how do you calculate the shadevalues?

 

Just qurious :D

 

is it .8 or .16 fixpoint? ... (or signed 7.8 or something)....

And sorry for beeing "nosy" :) but Im just qurious if you use vectormath or not.. and if that is the precision you get with the fixpoint?...

 

..Its nice work Orion! =)

 

 

 

some thoughts:

-Could the DSP do half of the screen? ... (ie ~2 faster... not counting the 16bit bus to dsp, hence the "~")

 

-Do you storew pixels to screen space with external stores?

A good idea would be to let the gpu build, say 1 scanline of the screen in GPU memory, and then Phraseblitt a hole scanline to screenspace... killing all external stores!

Combine this with the 1st point.. letting the dsp do same thing but other half of screen...

 

-Another gpu thingy is to do 2pixels at the same time... (dont exactly know how the inner loop looks but..) usually you can optimize it and interleave 2 similar calculations, and build 2 pixels at the same time, and make 1 store to GPU memory (storing two 16bit pixels at the same time).. and then after a scanline do the phrblitt to screen...

(this might increase part of the codesize by *2.. (&register usage) since you have to do same thing twice... and you might have to do special "pre-loop" setups.. to get it working... (common on the falcons dsp) but you kill all waitstates that you might have in the code.. and half the loopcount.... making use of true RISC power.. (1tick/instruction)

 

>I will try to start making some more useful stuff from now ^^)

 

Ohh... it was that then ;)

But this could be "usefull!" ... ;) ...in my eyes it is defenitely NOT a waste of time anyway.. :P

 

cheers

Share this post


Link to post
Share on other sites
Orion_    1
I have no idea how it actually works, but I can't imagine it is what I assume "raytracing" to be (ie allot of vector algebra, crossproducts,dotproducts, n stuff) ...since i assume that code to be bigger than 836 bytes ;)

there must be some simplification involved... or?

it is real raytracing, only one light, and 3 shaded sphere, no rotation, but it use a lots of dotproducts, 2 squareroot, some div, and that for each pixel, the GPU is quite fast I was impressed, but the fixed point make this fast too.

I can release the source code If you want :) but it's not really a clean source code, and it have some dirty last minute hack because I had problem with some distance comparison.

 

Is that 3 shaded spheres ? ...or just one? (the green one?) ..i mean regarding fixpoint, why does the red&blue look smoother shaded than the greenone?... wouldnt the math be the same?...

are there some Z values involved?..and how do you calculate the shadevalues?

The green one is closer to the camera than the 2 others, and I think because of the fixed point and lost of precision, the shaded colors are ugly :/

the shadevalue is calculated doing crossproduct between the light vector (from the light to the intersected point on the sphere) and the normal at the intersected point of the sphere surface.

 

is it .8 or .16 fixpoint? ... (or signed 7.8 or something)....

signed .8 (else I think it will overflow)

 

-Could the DSP do half of the screen? ... (ie ~2 faster... not counting the 16bit bus to dsp, hence the "~")

 

-Do you storew pixels to screen space with external stores?

A good idea would be to let the gpu build, say 1 scanline of the screen in GPU memory, and then Phraseblitt a hole scanline to screenspace... killing all external stores!

Combine this with the 1st point.. letting the dsp do same thing but other half of screen...

(I use external storew)

actually I was thinking of doing that before starting the raytracer, that's why I tried a simple test with the GPU to predict how fast It will be using those optimisation, even by reducing the screen to 160x120 I think I will not do better than 2fps, I don't know if that be worth it.

Share this post


Link to post
Share on other sites

so its raytracing, nice! =)

 

well source are always nice, and Im interested in the topic, and I could do a quick check if I notice some obvious optimisation possebilities. So if you whant to, then please do so =)

 

(I have some PC c source for this that i planed to dig into & convert.. but i never got to that..

 

>shadevalue is calculated doing crossproduct

you mean DotProduct? ... ;) ..but still... you mean you do a cross product at each pixel of the sphere to find the normal & then do a light*normal to get the shade.. thats still ALLOT of work.. (and it is the true "algebra way".... then it isnt that bad speedwise =)

 

 

>signed .8 (else I think it will overflow)

well it might still overflow of you do MAC's ..but with s7.8 (signed 7integer 8fixpoint) you get the correct sign with the built in mult instructions.. If you go for s15.16 then you get higher precision but you ned allot of more work :(

 

 

>I use external storew

Well if there is nothing else on the bus then perhapps.. but in worst case 1 storew will take the time it takes for the OP to do all objects in the objectlist!... since the OP hogs the bus while it is doing the OL, then storing internally is a much better way since it can always do that independently of what the other system is doing.

 

I did this when i did the maniac optimisation of my fire routine... first version storew'd ..and it becomes much much faster to build a scanline internally & phrblitt.

I noticed this even more with the "water" routine i made... (aarrgghhh!) ended up with a circular scanline buffer that was phrase blitted in & out, and calculated 2pixels at a time with half the loopcount.... and that was extremely much faster than doing 5 or 6 external mem accesses for each pixel... but it becomes a heck of alot more complex...

ahh well...

 

Its still nice work you did! =)

And it IS usfull!... i can imagine a 96K TYS demo and that could be a "gfx renderer" instead of storing the bitmap ;) ..so..

 

chers!

Share this post


Link to post
Share on other sites
Guest
You are commenting as a guest. If you have an account, please sign in.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoticons maximum are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

×