Jump to content
Jagware

SCPCD

Level2
  • Content count

    1,134
  • Joined

  • Last visited

Posts posted by SCPCD


  1. Yeah, the OP is running (though the machine is in VBlank and so it should only be executing branch and stop objects). It's actually just my JagLion code except I changed the initialization of the GPU to set DMAEN, and the only place that uses STOREP is the buffer clear function. Instead of clearing it I got a regular pattern.

     

    If you want to try it, the latest version went on my site today (just a lot of little bugfixes to improve stability, and I was using it as a test base for my project). http://harmlesslion.com/software/jaglion

     

    You'll find the line for setting DMAEN commented out in CALCMAND.S on line 240.

     

    Of course, if you find a bug in my work there that'd be great to clear that pending issue. Or even if you confirm it. :)

    I think that I have found why it doesn't work :

    you set the GPU's DMA priority mode when it starts, but this means that the GPU run always at DMA priority, so all external acces of the GPU core break the bus :

    for exemple :

        loadw (r20),r24    ; save old scanline
    .lwait1:
        nop
        loadw (r20),r4
        and r18,r4        ; mask VC off to just a line counter
        shrq #1,r4        ; divide by 2
        cmp r4,r24        ; wait for a change
        jr EQ,.lwait1
        nop
        move r4,r24        ; save the old value

    r20 is VC register which is external of the GPU core like all registers that is not in the GPU section (all registers from page 10 to 17 are regarded as external registers like Jerry registers)

     

    Like this, the GPU takes the priority during OP processing.

    I think that there would have a strange things append during these case.

     

    A solution to have DMA priority during only the VBLANK is to add :

    cpuint0:
        movei #G_FLAGS,r30
        load (r30),r29            ; read flags    
        bset #15,r29            ; DMA mode
        store    r29,(r30)
        nop

    at the start of the interrupt routine and modify the exit interrupt routine like this :

    exitint:
    ; finished interrupt, clean up
        movei #G_FLAGS,r30
        load (r30),r29            ; read flags
        bclr #3,r29                ; clear IMASK
        bset #9,r29                ; reset CPU int
        
        bclr #15,r29            ; return to normal mode

     

    The GPU run in normal priority mode during normal operation then turn into higher priority level during the interrupt routine and return to normal mode after the interrupt.

     

     


  2. So let's deviate from the original subject :P

    Does it mean that for leaving an interruption handler, the sample code

    given by Atari does not always work?

     

    WARNING - writing a value to the flag bits and making use of those flag bits in the following instruction will

    not work properly due to pipe-lining effects. If it is necessary to use flags set by a STORE instruction, then

    ensure that at least one other instruction lies between the STORE and the flags dependent instruction.

    Mean that you should not have an instruction that use at least one bit of the flags.

    for example :

    movei #GPU_FLAGS,r30
    load (r30),r29
    bset #14,r29    ; use of BANK1
    store r29,(r30)   
    jump T,(r3)        ; r3 is valide into BANK1
    nop

    This code is not safe because it's not sure that the r3 for the jump instruction is read into the BANK1

    but :

    movei #GPU_FLAGS,r30
    load (r30),r29
    bset #14,r29    ; use of BANK1
    store r29,(r30)
    nop
    jump T,(r3)        ; r3 is valide into BANK1
    nop

    is safe because we have a wait state slot that give the time to the pipeline to write correctly the r29 result to the GPU_FLAGS register.

     

     

    the next code is also safe :

    movei #GPU_FLAGS,r30
    load (r30),r29
    bset #14,r29    ; use of BANK1
    jump T,(r3)        ; r3 is valide into BANK0 <- Warning : bank 0 not bank 1 !
    store r29,(r30)

    because :

    - the jump instruction read r3 register and flags (that is not used here because T) before executing the store instruction

    - then execute the store instruction

    - and then jump.

     

    int_serv:
      movei GPU_FLAGS,r30; point R30 at flags register
      load (r30),r29; get flags
      bclr 3,r29; clear IMASK
      bset 11,r29; and interrupt 2 latch
      load (r31),r28; get last instruction address
      addq 2,r28; point at next to be executed
      addq 4,r31; updating the stack pointer
      jump (r28); and return
      store r29,(r30); restore flags

     

    Is the restore flags instruction is at the good place?

    yes, because we should read r28 into the correct bank (wich is actually into the BANK0 selected by the IMASK = 1) and consequently, we should read r28 before to write into the flags register.

    And as we should return to the previous bank ([0 or 1] depending of the bank used before the interrupt) before returning to the interrupted code, the only good place for the store is into the "nop" slot of the jump :)

     

     

    If I understand correctly the warning, they say that

    is is not safe to put the last store after the jump.

    nop, it's just that the coder should take into account the pipeline-effect when he modify flags register for the next instruction. (like all instructions that don't have writeback protection)

     

    This theory is consistent with the following remark (found in bug sections)

    · We've found that you can't put the IMASK clear in the delay slot of the jump out of the interrupt, because

    the instruction that was interrupted may not get the correct register bank (TWI - Brian McKee)

     

    Does anybody have experienced such things?

    I may have a (random) bug related to this.

    exept that this is a bug related into the "lies and damned lies" that is not true.

     

    If this doen't work, It would be impossible to use interrupts when a code use the BANK1 and I use always the bank1 for the main part and bank0 for interrupts and save states and it works perfectly (for exemple for the demo FACTS)

     


  3. GPU DMAEN bit and STOREP doesn't work.

     

    When DMAEN is set on the GPU, the high data register does not appear to be written to external memory on a STOREP (or it appears to write a fixed value but not the value in the high register - disabling DMAEN resumes normal operation.)

     

    Didn't test LOADP.

     

    Also, it doesn't work to overlap your OPL and your animation buffer. ;)

    In your test, is there OP or the DSP running ?

    I have made a test and this next code works fine (no OP nor DSP running)

     

    	.phrase
    gpu_code_start3:
    .gpu
    .org		G_RAM
    pGflags					.equr	r28
    cGflags					.equr	r29
    
    movei		#G_FLAGS,pGflags			;Flags GPU
    load		(pGflags),cGflags
    bclr		#3,cGflags
    bclr		#14,cGflags					;select bank0
    bset		#15,cGflags
    store		cGflags,(pGflags)			;mise a jour des flags
    
    nop
    nop
    movei		#G_HIDATA,r1
    movei		#$100000,r2
    
    movei		#256,r3
    
    movei		#-1,r4
    moveq		#0,r0
    .gpu_loop:
    store		r0,(r1)
    storep		r4,(r2)
    
    addqt		#8,r2
    addqt		#1,r0
    subq		#1,r3
    subqt		#1,r4
    
    jr			NE,.gpu_loop
    nop
    
    GPU_STOP
    
    .68000
    gpu_code_end3:
    dc.l		0

     


  4. move.l		#PITCH1|PIXEL16|WID128|XADDPIX,d0
    moveq		#0,d1
    loooooop:
    move.l		d0,A2_FLAGS
    move.l		#$100000,A2_BASE
    move.l		d1,A2_PIXEL
    move.l		d1,A2_STEP
    
    move.l		d0,A1_FLAGS
    move.l		#$80000,A1_BASE
    move.l		d1,A1_PIXEL
    move.l		d1,A1_FPIXEL
    move.l		d1,A1_STEP
    move.l		d1,A1_FSTEP
    move.l		d1,A1_CLIP
    move.l		d1,A1_INC
    move.l		d1,A1_FINC
    
    move.l		#$00010400,B_COUNT
    move.l		#SRCEN|LFU_REPLACE,B_CMD
    
    bra.s		loooooop
    

    post-5-1210789893_thumb.jpg

     

    We have : 11 cycles to copy 2 bytes, about 44cycles/phrases

     

    --------------------------------------------------------------------------------------------

    move.l		#PITCH1|PIXEL16|WID128|XADDPIX,d0
    moveq		#0,d1
    loooooop:
    move.l		d0,A2_FLAGS
    move.l		#$100000,A2_BASE
    move.l		d1,A2_PIXEL
    move.l		d1,A2_STEP
    
    move.l		d0,A1_FLAGS
    move.l		#G_RAM+$8000,A1_BASE
    move.l		d1,A1_PIXEL
    move.l		d1,A1_FPIXEL
    move.l		d1,A1_STEP
    move.l		d1,A1_FSTEP
    move.l		d1,A1_CLIP
    move.l		d1,A1_INC
    move.l		d1,A1_FINC
    
    move.l		#$00010400,B_COUNT
    move.l		#SRCEN|LFU_REPLACE,B_CMD
    
    bra.s		loooooop
    

     

    post-5-1210790400_thumb.jpg

     

    We have 5 cycles for 2 bytes, about 20cycles/phrases

     

    move.l		#PITCH1|PIXEL16|WID128|XADDPHR,d0
    moveq		#0,d1
    loooooop:
    move.l		d0,A2_FLAGS
    move.l		#G_RAM,A2_BASE
    move.l		d1,A2_PIXEL
    move.l		d1,A2_STEP
    
    move.l		d0,A1_FLAGS
    move.l		#$100000,A1_BASE
    move.l		d1,A1_PIXEL
    move.l		d1,A1_FPIXEL
    move.l		d1,A1_STEP
    move.l		d1,A1_FSTEP
    move.l		d1,A1_CLIP
    move.l		d1,A1_INC
    move.l		d1,A1_FINC
    
    move.l		#$00010400,B_COUNT
    move.l		#SRCEN|LFU_REPLACE,B_CMD
    
    bra.s		loooooop
    

     

    post-5-1210790876_thumb.jpg

     

    We have 11 cycles/phrases

     

    for DRAM->GPU, then GPU->DRAM we have about 20+11 = 31cycles/phrases which is < 44 cycles/phrases for the DRAM->DRAM version.

     


  5. I will make bench this night.

     

    But, don't forget that you can not read at GPU_RAM+$8000, it's a write only access. ;) (To allow faster transfers into the GPU space, all the registers are also available as thirty-two bit memory, at an offset of 8000 hex from their normal addresses. At this address, the internal memory is write only. p43/141)

     


  6. We know that division can not be pipeline : ie we need to make sure that the previous division is finished before make another.

     

    But I don't see into the jaguar documentation informations about the fact that division are "Atomic", ie we can't have interruptions while a division is performing.

    If division are atomic, we can have division into interrupts (but remainder should be saved manually into the interrupt routine and don't forget that the remainder register is RO, we can not restore the remainder value into it at the end of the interrupt routine).

     

    If division are not atomic (I hope that it's not the case), What's occur when there is interrupts during a division performed into BANK1 regs ? Until the interrupt use strictly BANK0 the write back of the division will be into BANK0 ??

    I don't know yet...It's the first time I think about it :unsure:


  7. I'm a little surprised that nobody here has even mentioned their JagCode work! Since it's a different environment than JS2 I was expecting a different, maybe more technical, conversation. :)

     

    Anything to be said guys? :) SPCPD, want to talk about your OP processor? (I'm curious about how much work it was to get that many moving objects on the spiral screen). Orion, any thoughts on building your poly engine? Anyone know who did The MAXX?

    :)

     

    The difficulty when we have so many sprites, is that the OP takes many bandwidth to the DRAM so we have not so many time to create the Object List.

    For exemple, in NTSC, there is 25 blanking line and 244 visible line.

    so if we push the OP to it's limit you can not access to the DRAM during 244 line and we have only 25 line to create the next list and all other things.

    The next limitation is to do a OP list that don't take more than 63.55µs to reach the STOP object else there will be glitch on screen.

     

    About technical choice on FACTS :

     

    With a logic analyzer we can see that the object processor have a quicker data access for the 2 first phrase of bmp data (due to pipeline effect).

    So I have the choice between 4x4 sprites and 8x8 sprites.

     

    With 4 width sprites, the OP read the BMP header and the first line of the sprite in 14 cycles (@26.59MHz) so in 63.55µs we could have 63.55µs*26.59MHz/14 = 120 sprites per line. (and the RMW mode doesn't take more cycle with this sprite size \o/)

    but create a list of 244 line of 120 sprites is impossible :D and 1 line by sprite is to difficult to manage.

    It's more interesting to have square sprite :)

    That's why I have cut the sprite list in 60 band of 4 lines (=244 visible line) :)

    But like I said before, we can have 120 sprites per line so about 120 sprites into the band. (because the OP should have finished the band in 63.55µs ;))

     

    The next thing to think about is the case of sprite that is between 2 band, and for that there is 2 solution :

    - cut the sprite and add it to both band

    or

    - the OP read 2 band by line so it can finish to draw previous sprite.

     

    I have chosen the 2nd one, because it's the one that take less CPU time.

     

    Then to have the maximum bandwidth for the GPU and the OP we should limit the use of the DRAM by the DSP and the 68k, for that in the demo the 68k are stopped and the DSP don't use the DRAM to generate the sound (thanks to zerosquare ! \o/)

     

    With so many sprite, we have no choice than to create 2 sprites list ;)

    And we need also many memory space :D

     

    For the demo, the sprite list takes about 256kbytes of memory. :whistling:

     

     

     

    In the spirale part, the GPU compute about 2688 sprites coordinate but all these sprites are not visible (about 1900 are always visible and up to 2090 visible) :)

    There is about 135 GPU cycle to add a sprite to the list. It's a very optimised code : {read sin/cos value, compute coordinate x/y, clipping x/y, compute bmp header, append the sprite to the list} for each sprites :)

     

    To draw spirale and move it with this precision, there is a very accurate cos/sin table :)

     

    All GPU code takes about 2kbytes and there is about 2kbytes of table for the glass effect and there is only 4 free bytes into the GPU memory ! :D

    I used also some tricks for the GPU code like automodifying code to reduce the size of the GPU code :)

     

     

     

    ----------

    I think that my post is not easy to read, It is late and I'm not good in english, so if you have any question, I will answer ;)


  8. I'm not sure, but I think that neither of blitter's regs are double buffered.

     

    [edit : "The data registers may only be

    written to while the Blitter is idle." page 70/141 of the Jag_v8 documentation]

     

     

    There is the double buffer (for some registers) for the jag2 blitter.

×