Comments on Rescue 1130: 2014 Pickup of an IBM 1130 System and More: Testing demonstrations, completing replica cosmetic improvements

Setting up such an intensive test (again, AFTER VC...

2016-08-05T13:32:05.903-04:00

Setting up such an intensive test (again, AFTER VCF) seems to me a matter of implementing a CRC check (rather than a comparison), and some additional logic on both sides of the link to enable a request for re-transmit and for the PC to log expected and actual data in the event of a CRC error. It does mean that you have a communication wrapper before the FPGA actually acts on what is received, and/or moves on from a send transaction. A CRC error detected by the FPGA on data coming in from the PC signals a special reply to the incoming data that sends back what it got. The FPGA waits for a re-transmit of the packet and keeps asking for a re-transmit until it gets one that passes CRC. Transmitted and received data gets logged on the PC for later review. (Naturally that reply from the FPGA also needs a CRC. If the reply gets a CRC error, the PC should log a double error, and the contents of the logged reply treated with skepticism. A burst of double errors also tells us something.) If a CRC error is detected on the PC side, a new transaction, "Say Again" is sent. Again, expected and actual data, as well as "double error" get logged.

2016-08-04T23:21:40.733-04:00

This comment has been removed by the author.

Hi William It looks like single bit errors, altho...

2016-08-04T23:21:35.387-04:00

Hi William

It looks like single bit errors, although I don't know if any of those shift the frame or just change odd bits. I would have done a more careful design and built better error correction, if I knew I would need it earlier than the week I was prepping for the show.

I think it would be a great idea for me to run a special intensive test of the link where I could definitively answer the questions you pose and understand the types of errors I will face.

First: Wow! What an amazing journey this 1130 and...

2016-08-04T15:36:51.282-04:00

First: Wow! What an amazing journey this 1130 and the other projects has been! Thank you so much for sharing it in such detail. There is a lot of really good historical lore being documented, in addition to the ton of work you're doing to preserve the hardware, the software, and the documentation. Now on to my specific comment:

It kind of jumped out at me how the error detection and correction is that you're doing for the USB interface is so very different from all the other error detection and correction you've implemented in the other projects. I remember you taking great care to cook up a SECDED hamming code for the SDLC link between your original ztex FPGA and the early secondary FPGA for fan out. (August 2015) We know that error detection and correction is tailored to the kinds of errors that happen on links. For example, disk errors tend to come in multi-byte dropouts because of surface problems so a block CRC is used. Core and semiconductor memory drops out one word at a time so parity/ECC codes are used.

When VCF is over, and you have time to do some analysis, it seems prudent to investigate the nature of those USB errors. Are they bit errors? Byte errors? Dropouts of blocks? Because right now, this compare and retry (with progressively more elaborate retry logic) seems like it's going to get progressively more messy as more special cases of erorrs that might happen and more special cases of recovery emerge. If more on the nature of the error were known, a simpler, more tailored error detection/correction setup might suggest itself.