Jump to content
Jagware

SCPCD

Level2
  • Content count

    1,134
  • Joined

  • Last visited

Everything posted by SCPCD

  1. Happy Birthday

    Happy birthday Tursi
  2. Mindebug : The Basis For A Jaguar Debugger

    I think because you upload the mindebug.jag to $4000, in fact, you should upload mindebug.jag to $1FF000 (don't tried, read into the Makefile.bat) else you will erase the mindebug program when upload the new program with it.
  3. Retro Gaming Week End !!

    My jaggy have : - a switch to select the frequency between original and 32MHz (the modul on the left) - a switch for 50/60Hz display mode (on the front right) - a switch to enable the fan (on the back right : the black one) - a switch to select between original rom and the BJL one (on the back right) - an external stereo jack 3.5mm for audio then I have added last year 3 internal connectors that permit me to connect my logic analyzer easily onto the jaguar bus (data, addr, control lines and some others signals) for a total of about 140 signals. This is very helpfull to debug software and hardware (for the JagCF for example) or make some software optimisation
  4. 0 0 0 0 0 0 0 0

    bon anniversaire Zerosquare
  5. Another Birthday, Yes Again !!! :)

    happy birthday Orion_ !!
  6. Another³ Birthday !!!

    Happy birthday Azrael !
  7. Another Birthday Yes !!

    Happy birthday Fredifredo !!! sorry for the late
  8. Gpu Dmaen Bit Breaks Storep

    I think that I have found why it doesn't work : you set the GPU's DMA priority mode when it starts, but this means that the GPU run always at DMA priority, so all external acces of the GPU core break the bus : for exemple : loadw (r20),r24 ; save old scanline .lwait1: nop loadw (r20),r4 and r18,r4 ; mask VC off to just a line counter shrq #1,r4 ; divide by 2 cmp r4,r24 ; wait for a change jr EQ,.lwait1 nop move r4,r24 ; save the old value r20 is VC register which is external of the GPU core like all registers that is not in the GPU section (all registers from page 10 to 17 are regarded as external registers like Jerry registers) Like this, the GPU takes the priority during OP processing. I think that there would have a strange things append during these case. A solution to have DMA priority during only the VBLANK is to add : cpuint0: movei #G_FLAGS,r30 load (r30),r29 ; read flags bset #15,r29 ; DMA mode store r29,(r30) nop at the start of the interrupt routine and modify the exit interrupt routine like this : exitint: ; finished interrupt, clean up movei #G_FLAGS,r30 load (r30),r29 ; read flags bclr #3,r29 ; clear IMASK bset #9,r29 ; reset CPU int bclr #15,r29 ; return to normal mode The GPU run in normal priority mode during normal operation then turn into higher priority level during the interrupt routine and return to normal mode after the interrupt.
  9. Gpu Dmaen Bit Breaks Storep

    Mean that you should not have an instruction that use at least one bit of the flags. for example : movei #GPU_FLAGS,r30 load (r30),r29 bset #14,r29 ; use of BANK1 store r29,(r30) jump T,(r3) ; r3 is valide into BANK1 nop This code is not safe because it's not sure that the r3 for the jump instruction is read into the BANK1 but : movei #GPU_FLAGS,r30 load (r30),r29 bset #14,r29 ; use of BANK1 store r29,(r30) nop jump T,(r3) ; r3 is valide into BANK1 nop is safe because we have a wait state slot that give the time to the pipeline to write correctly the r29 result to the GPU_FLAGS register. the next code is also safe : movei #GPU_FLAGS,r30 load (r30),r29 bset #14,r29 ; use of BANK1 jump T,(r3) ; r3 is valide into BANK0 <- Warning : bank 0 not bank 1 ! store r29,(r30) because : - the jump instruction read r3 register and flags (that is not used here because T) before executing the store instruction - then execute the store instruction - and then jump. yes, because we should read r28 into the correct bank (wich is actually into the BANK0 selected by the IMASK = 1) and consequently, we should read r28 before to write into the flags register. And as we should return to the previous bank ([0 or 1] depending of the bank used before the interrupt) before returning to the interrupted code, the only good place for the store is into the "nop" slot of the jump nop, it's just that the coder should take into account the pipeline-effect when he modify flags register for the next instruction. (like all instructions that don't have writeback protection) Does anybody have experienced such things? I may have a (random) bug related to this. exept that this is a bug related into the "lies and damned lies" that is not true. If this doen't work, It would be impossible to use interrupts when a code use the BANK1 and I use always the bank1 for the main part and bank0 for interrupts and save states and it works perfectly (for exemple for the demo FACTS)
  10. Gpu Dmaen Bit Breaks Storep

    In your test, is there OP or the DSP running ? I have made a test and this next code works fine (no OP nor DSP running) .phrase gpu_code_start3: .gpu .org G_RAM pGflags .equr r28 cGflags .equr r29 movei #G_FLAGS,pGflags ;Flags GPU load (pGflags),cGflags bclr #3,cGflags bclr #14,cGflags ;select bank0 bset #15,cGflags store cGflags,(pGflags) ;mise a jour des flags nop nop movei #G_HIDATA,r1 movei #$100000,r2 movei #256,r3 movei #-1,r4 moveq #0,r0 .gpu_loop: store r0,(r1) storep r4,(r2) addqt #8,r2 addqt #1,r0 subq #1,r3 subqt #1,r4 jr NE,.gpu_loop nop GPU_STOP .68000 gpu_code_end3: dc.l 0
  11. Happy Birthday !!

    it's a bit to late, but : Happy birthday Starcat !
  12. Mariaud Is In Da Place !!

    Happy birthday mariaud !
  13. Happy birthday templeton & xerus !
  14. And Yet Another !

    Happy birthday Symmetry !
  15. Blitter Timing

    This topic goal is to give informations about different timing acces for the blitter and to know more about blitter operating. First of all, the exemple code : move.l #PITCH1|PIXEL32|WID128|XADDPHR,d0 moveq #0,d1 move.l d0,A2_FLAGS move.l #source,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #destination,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD configure the blitter in 32-bit by pixel and transfert in Phrase mode. Source & destination will be different for each case that are describe bellow. We blitt : $0001 * $0400 * 4 (because 32-bit pixel selected) = 4096bytes. SRCEN : activation of a source, and LFU_REPLACE for a simple data copy. all other blitter register are not used and initialised here to zero. The GPU->DRAM transfert in phrase mode : source : $F03000 (G_RAM) destination : somewhere into the DRAM (phrase aligned) Result : 11 cycles per phrase -> 4096*11/8 = 5632 cycles for 4K The DRAM->GPU transfert in phrase mode : source : somewhere into the DRAM (phrase aligned) destination : $F03000 (G_RAM) Result : 7 cycles per phrase -> 4096*7/8 = 3584 cycles for 4K The DRAM->GPU speed transfert in phrase mode : source : somewhere into the DRAM (phrase aligned) destination : $F03000+$8000 (G_RAM+$8000) Result : 5 cycles per phrase -> 4096*5/8 = 2560 cycles for 4K ----------------------------------------------- other information in the futur If you have a special timing to mesure, I can help
  16. Blitter Timing

    move.l #PITCH1|PIXEL16|WID128|XADDPIX,d0 moveq #0,d1 loooooop: move.l d0,A2_FLAGS move.l #$100000,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #$80000,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD bra.s loooooop We have : 11 cycles to copy 2 bytes, about 44cycles/phrases -------------------------------------------------------------------------------------------- move.l #PITCH1|PIXEL16|WID128|XADDPIX,d0 moveq #0,d1 loooooop: move.l d0,A2_FLAGS move.l #$100000,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #G_RAM+$8000,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD bra.s loooooop We have 5 cycles for 2 bytes, about 20cycles/phrases move.l #PITCH1|PIXEL16|WID128|XADDPHR,d0 moveq #0,d1 loooooop: move.l d0,A2_FLAGS move.l #G_RAM,A2_BASE move.l d1,A2_PIXEL move.l d1,A2_STEP move.l d0,A1_FLAGS move.l #$100000,A1_BASE move.l d1,A1_PIXEL move.l d1,A1_FPIXEL move.l d1,A1_STEP move.l d1,A1_FSTEP move.l d1,A1_CLIP move.l d1,A1_INC move.l d1,A1_FINC move.l #$00010400,B_COUNT move.l #SRCEN|LFU_REPLACE,B_CMD bra.s loooooop We have 11 cycles/phrases for DRAM->GPU, then GPU->DRAM we have about 20+11 = 31cycles/phrases which is < 44 cycles/phrases for the DRAM->DRAM version.
  17. Blitter Timing

    I will make bench this night. But, don't forget that you can not read at GPU_RAM+$8000, it's a write only access. (To allow faster transfers into the GPU space, all the registers are also available as thirty-two bit memory, at an offset of 8000 hex from their normal addresses. At this address, the internal memory is write only. p43/141)
  18. le GPU

    We know that division can not be pipeline : ie we need to make sure that the previous division is finished before make another. But I don't see into the jaguar documentation informations about the fact that division are "Atomic", ie we can't have interruptions while a division is performing. If division are atomic, we can have division into interrupts (but remainder should be saved manually into the interrupt routine and don't forget that the remainder register is RO, we can not restore the remainder value into it at the end of the interrupt routine). If division are not atomic (I hope that it's not the case), What's occur when there is interrupts during a division performed into BANK1 regs ? Until the interrupt use strictly BANK0 the write back of the division will be into BANK0 ?? I don't know yet...It's the first time I think about it
  19. Jagcode 2 Entries

    I have dimensioned the band to have the same height than a sprite, consequently each sprite is in one or 2 band
  20. Encore Un Anniversaire...

    \o/ merci
  21. Jagcode 2 Entries

    The difficulty when we have so many sprites, is that the OP takes many bandwidth to the DRAM so we have not so many time to create the Object List. For exemple, in NTSC, there is 25 blanking line and 244 visible line. so if we push the OP to it's limit you can not access to the DRAM during 244 line and we have only 25 line to create the next list and all other things. The next limitation is to do a OP list that don't take more than 63.55µs to reach the STOP object else there will be glitch on screen. About technical choice on FACTS : With a logic analyzer we can see that the object processor have a quicker data access for the 2 first phrase of bmp data (due to pipeline effect). So I have the choice between 4x4 sprites and 8x8 sprites. With 4 width sprites, the OP read the BMP header and the first line of the sprite in 14 cycles (@26.59MHz) so in 63.55µs we could have 63.55µs*26.59MHz/14 = 120 sprites per line. (and the RMW mode doesn't take more cycle with this sprite size \o/) but create a list of 244 line of 120 sprites is impossible and 1 line by sprite is to difficult to manage. It's more interesting to have square sprite That's why I have cut the sprite list in 60 band of 4 lines (=244 visible line) But like I said before, we can have 120 sprites per line so about 120 sprites into the band. (because the OP should have finished the band in 63.55µs ) The next thing to think about is the case of sprite that is between 2 band, and for that there is 2 solution : - cut the sprite and add it to both band or - the OP read 2 band by line so it can finish to draw previous sprite. I have chosen the 2nd one, because it's the one that take less CPU time. Then to have the maximum bandwidth for the GPU and the OP we should limit the use of the DRAM by the DSP and the 68k, for that in the demo the 68k are stopped and the DSP don't use the DRAM to generate the sound (thanks to zerosquare ! \o/) With so many sprite, we have no choice than to create 2 sprites list And we need also many memory space For the demo, the sprite list takes about 256kbytes of memory. In the spirale part, the GPU compute about 2688 sprites coordinate but all these sprites are not visible (about 1900 are always visible and up to 2090 visible) There is about 135 GPU cycle to add a sprite to the list. It's a very optimised code : {read sin/cos value, compute coordinate x/y, clipping x/y, compute bmp header, append the sprite to the list} for each sprites To draw spirale and move it with this precision, there is a very accurate cos/sin table All GPU code takes about 2kbytes and there is about 2kbytes of table for the glass effect and there is only 4 free bytes into the GPU memory ! I used also some tricks for the GPU code like automodifying code to reduce the size of the GPU code ---------- I think that my post is not easy to read, It is late and I'm not good in english, so if you have any question, I will answer
  22. Pmdoomata Birthday !

    Happy birthday pmdata !
  23. Happy Birthday Kuk !

    Happy birthday Kuk !
  24. Ce Coup Ci C'est Pour L'écureuil

    Happy birthday Scrat !!
×