Looking at the 96-97 code, I think I understand the Tside flashing. Once loaded, the Flash can be checked for manufacturer and device. Based on that an algorithm is selected. I believe in all our cases that will be Intel. Please let me know if anyone has AMD parts.
Once the part is known, it is programmed to all zeros before erasing it to the factory state ($FF). It looks as if there is about 1K of buffer in the PRU to pass the new data to. The buffer is formatted with three extra bytes. Bytes are target address in flash high, then low and last is byte count. In this suggested case, byte count would be 128 or $80. This would permit 8 buffers of $80 to fit in ram for programming. Total buffer size will be ($80 + 3) * 8 = $418 bytes total. I will propose that the address of the buffer would start at $1BE8 thus it would end at the top of PRU ram $1FFF. Eight calls to the programming routine would yield 1K bytes programmed. I wonder if there is anything to be gained by buffering 1K or if buffering 128 would be more straight forward. Your thoughts?
TSide page 0 ranges from $2000 - $FFFF. Since there is no need to program the first 8K, the buffer (1K) will need to be filled 56 times to complete programming the page. Similarly, page 1 ranges from $8000 through $FFFF. There is no need to program the first 32K.
The result of all this is that a 1K buffer needs to be replenished 88 times on the TSide. That is 704 calls to the programming routine.
Again perhaps filling a 128 byte buffer 704 times will make the code easier. I believe the slowest part of this will be the serial transfer from PC to PCM.

Trying to size the time it will take to program... We need to transfer 88K bytes. That is just over 90K bytes. Each byte is sent as hex digits (that doubles the byte count). With overhead of ID, Count....checksum, I figure that programming the TSide needs to move something in the order of 200K bytes PC to PCM. There will also be 704 replies if all goes well that need to be processed. From this crude figuring, it looks like about 250K characters need sending/receiving & processing. That looks to me like it might take 8 - 10 minutes.

Is 8-10 minutes reasonable & expected (or has my math gone off the rails).

I plan to try this out once my parts arrive from China. It takes so long to ship things... If this is a success, I will be able to suggest a way to reprogram a brick without de-soldering the FLASH. Still it won't be the easiest thing because it will access the test connector. This will require the PCM to be opened up and so on.

-Tom