-
Content count
1,134 -
Joined
-
Last visited
Posts posted by SCPCD
-
-
Happy birthday Azrael ! -
happy birthday !
-
Happy birthday Fredifredo !!!sorry for the late
-
Yeah, the OP is running (though the machine is in VBlank and so it should only be executing branch and stop objects). It's actually just my JagLion code except I changed the initialization of the GPU to set DMAEN, and the only place that uses STOREP is the buffer clear function. Instead of clearing it I got a regular pattern.If you want to try it, the latest version went on my site today (just a lot of little bugfixes to improve stability, and I was using it as a test base for my project). http://harmlesslion.com/software/jaglion
You'll find the line for setting DMAEN commented out in CALCMAND.S on line 240.
Of course, if you find a bug in my work there that'd be great to clear that pending issue. Or even if you confirm it.
I think that I have found why it doesn't work :
you set the GPU's DMA priority mode when it starts, but this means that the GPU run always at DMA priority, so all external acces of the GPU core break the bus :
for exemple :
loadw (r20),r24 ; save old scanline .lwait1: nop loadw (r20),r4 and r18,r4 ; mask VC off to just a line counter shrq #1,r4 ; divide by 2 cmp r4,r24 ; wait for a change jr EQ,.lwait1 nop move r4,r24 ; save the old value
r20 is VC register which is external of the GPU core like all registers that is not in the GPU section (all registers from page 10 to 17 are regarded as external registers like Jerry registers)
Like this, the GPU takes the priority during OP processing.
I think that there would have a strange things append during these case.
A solution to have DMA priority during only the VBLANK is to add :
cpuint0: movei #G_FLAGS,r30 load (r30),r29 ; read flags bset #15,r29 ; DMA mode store r29,(r30) nop
at the start of the interrupt routine and modify the exit interrupt routine like this :
exitint: ; finished interrupt, clean up movei #G_FLAGS,r30 load (r30),r29 ; read flags bclr #3,r29 ; clear IMASK bset #9,r29 ; reset CPU int bclr #15,r29 ; return to normal mode
The GPU run in normal priority mode during normal operation then turn into higher priority level during the interrupt routine and return to normal mode after the interrupt.
-
So let's deviate from the original subjectDoes it mean that for leaving an interruption handler, the sample code
given by Atari does not always work?
WARNING - writing a value to the flag bits and making use of those flag bits in the following instruction willnot work properly due to pipe-lining effects. If it is necessary to use flags set by a STORE instruction, then
ensure that at least one other instruction lies between the STORE and the flags dependent instruction.
Mean that you should not have an instruction that use at least one bit of the flags.
for example :
movei #GPU_FLAGS,r30 load (r30),r29 bset #14,r29 ; use of BANK1 store r29,(r30) jump T,(r3) ; r3 is valide into BANK1 nop
This code is not safe because it's not sure that the r3 for the jump instruction is read into the BANK1
but :
movei #GPU_FLAGS,r30 load (r30),r29 bset #14,r29 ; use of BANK1 store r29,(r30) nop jump T,(r3) ; r3 is valide into BANK1 nop
is safe because we have a wait state slot that give the time to the pipeline to write correctly the r29 result to the GPU_FLAGS register.
the next code is also safe :
movei #GPU_FLAGS,r30 load (r30),r29 bset #14,r29 ; use of BANK1 jump T,(r3) ; r3 is valide into BANK0 <- Warning : bank 0 not bank 1 ! store r29,(r30)
because :
- the jump instruction read r3 register and flags (that is not used here because T) before executing the store instruction
- then execute the store instruction
- and then jump.
int_serv: movei GPU_FLAGS,r30; point R30 at flags register load (r30),r29; get flags bclr 3,r29; clear IMASK bset 11,r29; and interrupt 2 latch load (r31),r28; get last instruction address addq 2,r28; point at next to be executed addq 4,r31; updating the stack pointer jump (r28); and return store r29,(r30); restore flags
Is the restore flags instruction is at the good place?
yes, because we should read r28 into the correct bank (wich is actually into the BANK0 selected by the IMASK = 1) and consequently, we should read r28 before to write into the flags register.
And as we should return to the previous bank ([0 or 1] depending of the bank used before the interrupt) before returning to the interrupted code, the only good place for the store is into the "nop" slot of the jump
If I understand correctly the warning, they say thatis is not safe to put the last store after the jump.
nop, it's just that the coder should take into account the pipeline-effect when he modify flags register for the next instruction. (like all instructions that don't have writeback protection)
This theory is consistent with the following remark (found in bug sections)· We've found that you can't put the IMASK clear in the delay slot of the jump out of the interrupt, because
the instruction that was interrupted may not get the correct register bank (TWI - Brian McKee)
Does anybody have experienced such things?
I may have a (random) bug related to this.
exept that this is a bug related into the "lies and damned lies" that is not true.
If this doen't work, It would be impossible to use interrupts when a code use the BANK1 and I use always the bank1 for the main part and bank0 for interrupts and save states and it works perfectly (for exemple for the demo FACTS)
-
GPU DMAEN bit and STOREP doesn't work.When DMAEN is set on the GPU, the high data register does not appear to be written to external memory on a STOREP (or it appears to write a fixed value but not the value in the high register - disabling DMAEN resumes normal operation.)
Didn't test LOADP.
Also, it doesn't work to overlap your OPL and your animation buffer.
In your test, is there OP or the DSP running ?
I have made a test and this next code works fine (no OP nor DSP running)
.phrase gpu_code_start3: .gpu .org G_RAM pGflags .equr r28 cGflags .equr r29 movei #G_FLAGS,pGflags ;Flags GPU load (pGflags),cGflags bclr #3,cGflags bclr #14,cGflags ;select bank0 bset #15,cGflags store cGflags,(pGflags) ;mise a jour des flags nop nop movei #G_HIDATA,r1 movei #$100000,r2 movei #256,r3 movei #-1,r4 moveq #0,r0 .gpu_loop: store r0,(r1) storep r4,(r2) addqt #8,r2 addqt #1,r0 subq #1,r3 subqt #1,r4 jr NE,.gpu_loop nop GPU_STOP .68000 gpu_code_end3: dc.l 0
-
it's a bit to late, but :
Happy birthday Starcat ! -
Happy birthday mariaud ! -
Happy birthday templeton & xerus !
-
Happy birthday Symmetry ! -
move.l #PITCH1|PIXEL16|WID128|XADDPIX,d0 moveq #0,d1 loooooop: move.l d0,A2_FLAGS move.l #$100000,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #$80000,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD bra.s loooooop
We have : 11 cycles to copy 2 bytes, about 44cycles/phrases
--------------------------------------------------------------------------------------------
move.l #PITCH1|PIXEL16|WID128|XADDPIX,d0 moveq #0,d1 loooooop: move.l d0,A2_FLAGS move.l #$100000,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #G_RAM+$8000,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD bra.s loooooop
We have 5 cycles for 2 bytes, about 20cycles/phrases
move.l #PITCH1|PIXEL16|WID128|XADDPHR,d0 moveq #0,d1 loooooop: move.l d0,A2_FLAGS move.l #G_RAM,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #$100000,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD bra.s loooooop
We have 11 cycles/phrases
for DRAM->GPU, then GPU->DRAM we have about 20+11 = 31cycles/phrases which is < 44 cycles/phrases for the DRAM->DRAM version.
-
I will make bench this night.
But, don't forget that you can not read at GPU_RAM+$8000, it's a write only access. (To allow faster transfers into the GPU space, all the registers are also available as thirty-two bit memory, at an offset of 8000 hex from their normal addresses. At this address, the internal memory is write only. p43/141)
-
We know that division can not be pipeline : ie we need to make sure that the previous division is finished before make another.
But I don't see into the jaguar documentation informations about the fact that division are "Atomic", ie we can't have interruptions while a division is performing.
If division are atomic, we can have division into interrupts (but remainder should be saved manually into the interrupt routine and don't forget that the remainder register is RO, we can not restore the remainder value into it at the end of the interrupt routine).
If division are not atomic (I hope that it's not the case), What's occur when there is interrupts during a division performed into BANK1 regs ? Until the interrupt use strictly BANK0 the write back of the division will be into BANK0 ??
I don't know yet...It's the first time I think about it
-
I have dimensioned the band to have the same height than a sprite, consequently each sprite is in one or 2 band
-
\o/
merci
-
I'm a little surprised that nobody here has even mentioned their JagCode work! Since it's a different environment than JS2 I was expecting a different, maybe more technical, conversation.Anything to be said guys? SPCPD, want to talk about your OP processor? (I'm curious about how much work it was to get that many moving objects on the spiral screen). Orion, any thoughts on building your poly engine? Anyone know who did The MAXX?
The difficulty when we have so many sprites, is that the OP takes many bandwidth to the DRAM so we have not so many time to create the Object List.
For exemple, in NTSC, there is 25 blanking line and 244 visible line.
so if we push the OP to it's limit you can not access to the DRAM during 244 line and we have only 25 line to create the next list and all other things.
The next limitation is to do a OP list that don't take more than 63.55µs to reach the STOP object else there will be glitch on screen.
About technical choice on FACTS :
With a logic analyzer we can see that the object processor have a quicker data access for the 2 first phrase of bmp data (due to pipeline effect).
So I have the choice between 4x4 sprites and 8x8 sprites.
With 4 width sprites, the OP read the BMP header and the first line of the sprite in 14 cycles (@26.59MHz) so in 63.55µs we could have 63.55µs*26.59MHz/14 = 120 sprites per line. (and the RMW mode doesn't take more cycle with this sprite size \o/)
but create a list of 244 line of 120 sprites is impossible and 1 line by sprite is to difficult to manage.
It's more interesting to have square sprite
That's why I have cut the sprite list in 60 band of 4 lines (=244 visible line)
But like I said before, we can have 120 sprites per line so about 120 sprites into the band. (because the OP should have finished the band in 63.55µs )
The next thing to think about is the case of sprite that is between 2 band, and for that there is 2 solution :
- cut the sprite and add it to both band
or
- the OP read 2 band by line so it can finish to draw previous sprite.
I have chosen the 2nd one, because it's the one that take less CPU time.
Then to have the maximum bandwidth for the GPU and the OP we should limit the use of the DRAM by the DSP and the 68k, for that in the demo the 68k are stopped and the DSP don't use the DRAM to generate the sound (thanks to zerosquare ! \o/)
With so many sprite, we have no choice than to create 2 sprites list
And we need also many memory space
For the demo, the sprite list takes about 256kbytes of memory.
In the spirale part, the GPU compute about 2688 sprites coordinate but all these sprites are not visible (about 1900 are always visible and up to 2090 visible)
There is about 135 GPU cycle to add a sprite to the list. It's a very optimised code : {read sin/cos value, compute coordinate x/y, clipping x/y, compute bmp header, append the sprite to the list} for each sprites
To draw spirale and move it with this precision, there is a very accurate cos/sin table
All GPU code takes about 2kbytes and there is about 2kbytes of table for the glass effect and there is only 4 free bytes into the GPU memory !
I used also some tricks for the GPU code like automodifying code to reduce the size of the GPU code
----------
I think that my post is not easy to read, It is late and I'm not good in english, so if you have any question, I will answer
-
Happy birthday pmdata ! -
Happy birthday Kuk ! -
Happy birthday Scrat !! -
Happy birthday GT !!! -
happy birthday stabylo
-
A long time without news, yes we'are overbooked !! Working on a lot of projects, a lot of them will be present at the A.C. party in march (News of the A.C.If I don't say wrong things, AC is in April
-
I'm not sure, but I think that neither of blitter's regs are double buffered.
[edit : "The data registers may only be
written to while the Blitter is idle." page 70/141 of the Jag_v8 documentation]
There is the double buffer (for some registers) for the jag2 blitter.
-
Happy birthday !
(Déjà !!! )
Another Birthday, Yes Again !!! :)
in Random chatter
Posted · Report reply