Tuesday, July 28, 2015

fpga controller for Pertec drive now communicates with PC; diagnosed problem with 5V timer board

PERTEC D3422 DRIVE RESTORATION

While I was debugging the general logic that I built to fire off a read of a target sector into the FIFO and then display it sequentially as bytes on the fpga board seven-segment displays, I decided to add in some additional logic to properly deal with the separate header and data fields in each sector.

The issue is that the drive resynchronizes between the two fields with a preamble of a couple hundred bits of zero, which means that I have to stop by byte assembly and resync to get the correct byte boundaries of the data field. Initially I just held the boundaries I established for the header field.

I know the byte count of a header field (and of a data field) - 5 and 257 respectively - following the preamble and sync byte for each. That was built into the FSM for reading so that it drops sync after storing byte five of the header, then begins again with the first payload byte of the data field.

Testing at lunchtime zoomed in on a couple of problems, which I corrected. By the end of my lunch hour I had the logic reading the five header bytes (cylinder high byte, cylinder low byte, combined platter/head/sector, CRC byte 1 and CRC byte 2).

I still have something zeroing out my FIFO so that I can't find any contents even though I received the five byte pulses that should have been stored. It may be my reset action between header and data fields, or it may be when I stop at the end of the sector. I will re-instrument and check again in the late afternoon.

I began designing a simple transactional communication method that would use the Digilent USB link and the Adept utility. With that in place, I should be able to drive this from the PC, setting up cylinder and sector targets, commanding seeks and reads, plus requesting bytes from the FIFO.

With that all prepared, I went out to the workshop and did some testing. I still have the undesirable behavior of the +5V Timer Board, powering down the drive after 6 minutes. This should only occur when the drive is idle and spun down (safe light on). I will need to do some circuit testing as this timeout is pretty annoying, introducing a whole cycle of stop and start which wastes minutes.

I also have communication between the PC and fpga board. I can command a read or seek and I see the FSMs kicking off, but they stall. While I have some defects, the ability to issue commands and check signal status via the PC is a great convenience.

I suspect either the 7400 or 7493 chip on the timer board - this is the logic that holds the decade counter in reset so that it never times out to shut off the main +5V supply. The 7400 is an extraordinarily basic (quad NAND) chip that for some reason I don't have in my pile of chips. The fact that I am missing the 7493 counter is less surprising but I do need that chip as well. My only option to grab those tonight, Frys, is a washout as they don't pretend to carry the 7400. They stock various other slightly incompatible family versions, e.g. 74ls00, 74hc00, and 74hct00.

I will drive to Anchor tomorrow morning to pick up the actual parts I need. Tonight I lifted the lead off the board from the driver (7400) chip that would hold the counter chip in reset, instead tacking a connection to ground to that circuit. I ran the drive for ten minutes without the timer shutting me down, so that makes the 7400 the suspect chip of the two, but I will pick up both as a preemptive measure.

Before I remove the bad chip, I took my remaining time tonight to debug my fpga controller logic a bit further. I could see that my read command was accepted and the main logic thought that it kicked off the sector read logic to search and match the target sector. However, something isn't correct in the sector read logic so my instrumentation moved to trace that portion of the logic.

Now with the next set of tracing, I see the read sector logic finding the proper sector number and turning on the final status where it is reading incoming bytes - however, the intermediate step which fires off the synchronizer seems not to occur - the scope wouldn't trigger on this status. 

New instrumentation is in place but it is late tonight, so I will postpone further testing until tomorrow.

Monday, July 27, 2015

Debugging read/display logic in fpga based controller for Pertec D3422 disk

PERTEC D3422 DRIVE RESTORATION

The fpga board has four seven segment display characters. I use the last two to display the hexadecimal value of the current byte, while the first two record the hexadecimal address of that byte. A button allow me to start at the first character and to step through one byte at a time. I can set up a five bit address in the slide switches on the board, with that set as the sector or the current cylinder by pushing either of the two remaining buttons. This should be sufficient for my initial testing.

I received confirmation from Mike and Martin, two Altair Pertec hard disk restorers who were able to confirm that the Altair controller wrote the head value in ones-complement, just as I saw when reading the tracks.

Another change I introduced was to pulse the Start/Stop line active for only 1 us, doing this about 100 times a second, each time resetting the timer in the +5V Timer board so that it will not shut down the drive.

Ultimately, I will emit the pulsing only when the controller is executing some command. That way, when there is an extended period where the drive is really idle, the board will spin down the drive and shut off most of the power consumption.

I worked through the debugging of all the new or changed logic in the fpga, then fired up the system for the first tests in the early evening. The first problem I found was that the Start/Stop pulsing idea was a non-starter. It attempted to start the disk, but wasn't long enough to complete the latch for spinup, so then it timed out and went back to safe just in time to receive the next short pulse.

I wasn't sure what was working or not working properly, since the overall function didn't seem to give me a number of bytes to display. When I push the button to advance to the next sequential byte, the byte counter leaps to the maximum value.

After some adjustments, I fixed that issue but wasn't sure the machinery was working properly. This required setting up a number of diagnostic outputs - steady state conditions can be displayed on the 8 LEDs on the board, but four of those also had an external connection where I could hook the scope to capture pulses or short conditions. Thus, four steady state diagnostic outputs and four outputs that could be either very short/dynamic or steady.

Digging through complex mechanisms with many states, tracking key signals, and matching them to the real time disk signals - all this takes time. I exhausted the time today and have more to do tomorrow. 

Sunday, July 26, 2015

Reading headers for sector zero on all platters, heads and cylinders

PERTEC D3422 DRIVE RESTORATION

My improved  logic for reading is now cleanly finding sector zero and the sync byte is clearly visible after the preamble of many zero bits. I looked into my synchronizer logic to figure out why it wasn't working properly. It should kick off the byte assembler which takes all the subsequent bits and packages them in bytes with a notification for the consuming circuit as each is ready. However, that isn't occuring.

Synchronizer byte followed by cylinder head and sector value from sector zero
The format used on this disk, which is likely the format used with Altair computers back in the dawn of personal computing, is very different from that used with the 1130. The drive interface itself is pretty similar and part of my logic will make the transition, but quite a bit will have to change.

The Altair disk has 24 sectors per rotation, with 256 bytes of data in a sector. After a preamble of a couple of hundred bits of zero to match the clocking of the recorded data, there is a sync byte (xFF) that defines which bit is number 0 in a byte, so that the stream of bits can be divided properly into bytes. The first bit encountered is the least significant digit of its byte.

In the Altair format, after the sync byte, there is a header which records the cylinder, head and sector of this record, It has a CRC checksum of two bytes. There is another preamble of a couple of hundred zeroes following the header, then the data portion of the sector begins with a sync byte of xFF. The data field consists 256 bytes of data, a two byte CRC and is followed by a couple of hundred zeroes.

The 1130 format disk has 4 sectors per rotation, with 321 words of 16 bits in a sector(642 bytes). After a preamble of a couple of hundred bits of zero to match the clocking of the recorded data, there is a sync word., Each word on the disk has its individual ECC bits at the end, thus a recorded word is 20 bits long. The sync word is 00000000000000011110 which is x'01' with its ECC. The first bit encountered in a word is the least significant.

In the 1130 format, after the sync word there are 321 of the twenty bit words that comprise the sector. No separate header field is used. The first of the words contains the relative sector number but that is purely a convention of the DMS2 operating system, whereas the controller treats all 321 words as the data field. There is no CRC since each word was protected by its own ECC.  More zeroes follow the end of the data field, continuing until the next sector pulse.

On the Pertec drive, the sector count (sector number) from the disk drive changes to 0 at roughly 2.4 us before the sector pulse leading edge. Thus, when the index pulse and sector pulse sequence completes, the drive heads are at the beginning of sector zero. The sector counter is leading the sector pulses on the Pertec drive.

On the 1130 drive, the index and sector pulses occur 180 degrees away from the read/write head location. Thus, when the index and sector pulses occur, the heads are at the beginning of sector 3, the prior sector. The sector number reported is 0. The controller has to wait until the following sector pulse, while the sector number field is just about to switch from 0 to 1, for when it begins read or write operations for sector zero. Thus, the sector number is anticipatory, used by software to issue a read or write for the upcoming sector.

With the Altair controller on the Pertec drive, the user issues a command to read or write a specific sector, with the controller doing the match to the sector number as the trigger to read or write. This is unlike the 1130, thus the sector number is not anticipatory on the Pertec. Instead, it indicates that the sector pulse trailing edge demarcates the beginning of this particular sector number.

I had the synchronizing and byte assembly working fine by the mid-afternoon. Decoding the bytes as they were read from the headers, I did see exactly what I expected from the cylinder field and the sector portion of the sector/head byte. I forced the arm to three locations - cylinder 0, cylinder 3 and cylinder 256 and found the proper value being read for each of them.

The format of the sector/header byte is that the three most significant bits encode the head, while the remaining bytes encode the sector number. I selected the four different heads in turn and looked at the head bits. They were 111, 110, 101, and 100 however according to the Altair controller documentation these are not the right values. In fact, they should be the inverse - 000, 001, 010 and 011.

Pattern of 111 (plus sector 00000) in left byte is inverse of expected 000
Setting that issue aside for more investigation - such as looking through the source code of the Altair controller board - I moved on to building out my logic for reading. I generated a FIFO buffer to store bytes as they are retrieved from disk and then fetch them for display or other use.

The fpga board has four seven segment display characters. I use the last two to display the hexadecimal value of the current byte, while the first two record the hexadecimal address of that byte. A pair of buttons allow me to start at the first character and to step through one byte at a time. This should be sufficient for my initial testing.

I started to update the control logic of the board for this read testing, removing the seek testing since everything looked perfect with that functionality. Every cylinder I went to had the correct cylinder number in the header record and read cleanly. 

I did take another (amateur quality) video of the positioner testing, now that I can loop seek patterns as well as throw in some singletons. Another video of positioner activity this time with no talking.

Saturday, July 25, 2015

Seeking okay now verifying reading functionality of Pertec D3422 drive

PERTEC D3422 DRIVE RESTORATION

I began to suspect that my problem had more to do with when I deasserted the Seek Strobe, relative to the address lines changing, since the drive acts on the trailing edge of the strobe signal. I made some changes to the FSM based on that suspicion and gave it a quick test last night.

Video of drive under control of fpga testing controller - watch to see it spin up, seek out and back.

The mechanism was now seeking out and in with the half-range (out 203 cylinders and then back to home) which corresponds to a 2310 disk drive capacity. This morning I ran some more tests of different seek amounts including looping on 203 and 406 cylinder long seeks, and some very short seek loops. It all worked well - jumping between 0 and 405, small range seeks, looping seeks. I also validated that I get the Address Interlock signal (invalid seek address) with an address beyond 405.

I invested my morning in writing the initial code to handle reading from the disk. The data is encoded with a self clocking scheme similar to that used on tape drives, where gaps between sectors and fields are uniformly magnetized but all data is recorded with a clock at double the data rate. My extracter circuit signals me when each bit is available and makes the value available.

The self clocking scheme means that any pattern of data, whether the bits are zero or one, all start with a flip of the magnetic field to mark the clock and then a second flip only if the data will be a logical 1. If the data is zero, the second time interval has no flip. Each pair of intervals, the clock pulse and then the data-dependent second pulse, is called a bit cell.

Each recorded area on a disk begins with a preamble of a couple of hundred zero bits. This is a pattern with the clock pulse and the absence of the second data pulse. This allows the receiver electronics to synchronize with the pulses and know whether a pulse it sees is the clock or the data pulse.

The 200 bits allow it to get synchronized so that the electronics in the drive can split out the clock signal and the data signal. My fpga recieves a steady uniform set of pulses on the Read Clock line and during the second half of each clock cycle, if there is a pulse on the Read Data line, then we record the value as a logical 1. No pulse from Read Data in the second half of a clock cycle and we read this as a logical 0.

Following the string of zero bits is a specific pattern that is recognized by the controller, which so far is synchronized only on bit cells and clocks, but not on byte or word boundaries. The specific pattern defines which bit cell is the beginning of each byte/word of data. Following the special pattern is whatever data format was recorded, ended with an error checking Cyclical Redundancy Check of 16 bits and another specific pattern marking end of the record.

The Pertec drives have an initial set of data the record the cylinder, head and sector number of this record. It should certainly match the cylinder to which we did a seek and the platter, head and sector from which we are reading. After the CRC for the header, there is an erased gap, another 200 bits for clock synchronizing, another specified pattern for byte synchronizing, 256 data bytes, then the CRC and end character. This happens 24 times around the track.

IBM 2310 drives don't have a formally separate header. They use the same preamble of 200 bits of zero, a specific pattern for byte boundary synchronizing, but then have 321 16 bit words in a monolithic sector, capped with a CRC field and end character. By convention the first of the 321 words contains the cylinder/head/sector information, as a relative sector from the start of the pack, leaving the remaining 320 words of the sector for data. The 2310 drive only provided four sectors around a track.

I began my logic design with the logic to convert the Read Data and Read Clock inputs into a stream of serial bit values (extracter circuit). Another bit of logic recognizes the byte boundary synchronizing character (synchronizer circuit). A third bit of logic then takes a serial stream of bits and turns it into parallel bytes (assembler circuit). A higher level state machine would move from gap (idle) state to preamble to byte synch and then store data bytes in a memory on the board, but I haven't designed that yet..

When I tried to test this I quickly ran into the problem that unless I was immensely lucky, switching on the read enable before or in the midst of a preamble, I would run into a character that was neither 0x00 or 0xFF, triggering a sync error. This leads to two changes - first, I need to recover from sync errors rather than lock in the status, and second I need to synchronize the start sync operation with the beginning of a sector.

Initially, I set up the controller to look for sector zero and read that - index pulse triggers the FSM and then I enable the read electronics, begin synchronizing and let it rip. The synchronizing logic looks for a string of at least 100 zero values, then eight sequential 1 bits, after which it should be assembling bits into bytes and flagging their availability.

When I see the sector counter change from zero, I will turn off the read electronics which stops the flow of clock and data pulses. I didn't check CRC or handle the gap and data record. I only cared about the header and its confirmation that I was reading the proper sectors.

In my testing, I discovered that the +5V Timer Board feature will power down the drive after 6 minutes, whether or not I am doing seeks or reads. I needed to figure out what signal states are needed to keep this from happening. It turns out that while the normal practice is to drop the line to activate it, hold it for a millisecond or so, then return it to high, that allows the timer board to count down and shut off.

My logic now has to drop the start/stop line, keep it low, then if I push the button again to shut down, it will have to return the line to high for an interval, pulse it down, then leave it high. It adds a few stages to the FSM. After testing, I still have problems. If I hold the line low after the drive starts, it locks out the button on the front of the drive. If I leave the line high, the timer board shuts things down in six minutes. This is being punted until tomorrow, as it is lower priority than completing the read testing.

I can see my data recovery working, but something is going wrong on the path to assembled bytes. To debug, I routed several of the key signals out where I could watch for them on the scope. My extracter logic is working perfectly - when I turn on reading, the data values begins to fly across the wire.

I could see that my method of detecting the start of sector is inadequate, thus I had to improve it in order to definitively locate and start reading at sector zero. I thought about it for a while and updated the FSM accordingly.

Now I am clearly oriented to the beginning of sector zero. I see the bits being extracted, a run of zeroes followed by an all-ones bytes (xFF), which should kick off my byte assembler but it isn't. The synchronizer logic is where the fault lies, so I will ponder that a while.

My seek loop would stall after many repetitions - the fault appears to be in my testing FSM which is waiting for busy to blip up and down. I suspect that some conditions occur where the rise has already happened before I test for it. 

Thursday, July 23, 2015

FPGA controller controlling arm on Pertec disk drive, but needed change to cabling for reliability

PERTEC D3422 DRIVE RESTORATION

The interface on my drive is defined by the model number as "special", which means it doesn't match the schematics in any Pertec document and more importantly, it doesn't make use of the terminator power pin on the main connector to the interface board.

The drive produces 3.3V as pullup power for the terminator resistors, but this interface board ignores that. Instead, it feeds terminator power to the interface on pins 1 and 2 of one of the two 50 conductor cables. I shall have to do the same, feeding from my fpga board.

It was time to bring up the fpga board and check out the voltage levels on the incoming signals, in addition to powering up the drive itself. I proceeded to fire up the drive and then use the fpga board to command a spin-up. The voltage checks without power produced confusing results and I had to do some troubleshooting., Turns out that the ground of the interface is also not connected to the drive ground so I had to include that wiring.

Once the terminator was pulling voltages up and I was receiving reasonable status, I began to test out the functionality. The results were good but not perfect for this first run. Successes:

  • A button push would command spin up or shut down of the drive
  • When I set up a nonzero cylinder address and hit the seek putton, it moved the arm
  • When I pushed the restore button, it brought the arm back to cylinder zero position
  • The Unit Ready signal came out as the drive came ready
  • Write Protect lights and extinguishes as I change the drive write protect switches
Failures to fix up include:
  • Write protect status appears on the wrong led compared to what I expected
  • I don't see the sector and index pulses on the LED, which makes sense given their short duration and high frequency
  • I could see that one pair of the switches for entering a cylinder address are swapped positionally
  • Emergency Unload does not happen at my button push even when held low for a long period
  • my logic to display the sector count register appears defective
  • sometimes the seek or restore buttons didn't work, other times they did
I can get erratic status led illumination if I wiggle the connectors onto the fpga board, so these are not of sufficient quality. I have female connectors on my cable and the fpga board is female connectors, so I was relying on some male-male adapters but they fit loosely in the connectors. This may work for initial checkout but I need a more reliable cabling scheme for when I begin to use the drive in earnest. 

Looking at the signals I use, versus irrelevant ones (like double density, double or quad platter because I know what I want), plus dropping the busy and select lines for drives 2 3 and 4 that I won't have connected, leaves me with exactly 39 interface signals to receive or drive. 

That is a fortuitous number, because it matches the 40 I/Os of FX2 extension board I have on the fpga board. This extension board will allow me to solder all the wiring onto the board, rather than relay on flaky connectors and adapter pins. 

I will begin rewiring the interface all to the extension board, dropping signals I will ignore, and documenting the changes so I can update the fpga logic appropriately. This is a must-do for today before I try to accomplish more testing. I also had a wire snap off one of the connectors, which is inevitable with a rickety ad hoc cabling. 

Investigating my problem areas from the first test, I found and corrected a few of the issues. Others will take more testing, particularly the state machines for my command buttons and the sector count display logic. 

I put a voltmeter on the signal for Emergency Unload, which should be pulled up to +3 but then will be yanked down to ground when I drive the fpga output pin to logic level 0. It instead only drops to a bit under 2V when my signal is asserted. This means something else is holding this line up, which I have to diagnose within the disk drive. The wiring from the schematics shows this signal hooked directly to the input of an inverter, a section of U50 on the logic board, so that is where I need to look. 


By dinner time, I had soldered in 22 of the 39 signals, those from one of the two 50 conductor cables, and went back to the workshop afterwards to finish the last 17. I will then have to update my documentation and the vhdl code to pick up the signals on these new pins. The power on test showed it was correctly set up.

The sector number display is whizzing through the numbers 00 to 23 that occur on each rotation. This validates the logic to see the index and sector pulses, plus the counter that tracks which sector is under the heads at any moment.

It appears that a Restore Initial Cylinder wouldn't work if I had a non-zero address in the cylinder address switches. On investigation, the reason is that I am violating setup times for that signal. A seek strobe is issued essentially simultaneously with setting that restore line, whereas it needs to be settled before the seek strobe occurs. I made changes to my state machines to correct this.

The problem that is still open involves the emergency unload signal, as I mentioned above. I set up the voltmeter to monitor what happens to that line when I try to drag the output pin of the fpga down to zero. I may have a bad chip or two in that path. What I see is the input to the first inverter stays at full high (3.29V) regardless of my attempts to pull it down.

It behaves as if something is doing a hard pullup of the pin on the logic board, so that my attempt to drop it simply divides the voltage across the resistors in the circuit from the fpga to the logic board inverter. This could be a failed inverter that is outputting on the input pin, but I suspect this is in my special interface board somewhere.

I exercised the positioner quit a bit tonight, moving it small and large amounts in both directions. My corrected logic for the Restore Initial Cylinder function now works - it moves the arm slowly back instead of the jump that occurs if it were a seek to cylinder zero.

Tonight and tomorrow I will change the controller to execute some programmed sequences of seeks, plus I will set up to switch between the four surfaces (two platters, two heads for each). That will be the last of the commands I can try that don't involve reading or writing on the disk.

It will also have a switch to turn on the read enable line, to see if it attempts to read data. I can set up a scope to watch what streams in from the heads. The fpga board has plenty of onboard memory capacity, so that I could read in the bitstream and begin analyzing it.

Wednesday, July 22, 2015

Matching timing to needs of the Pertec drive and cleaning up my controller logic

PERTEC D3422 DRIVE RESTORATION

This morning I sorted out the problem with the Activate Emergency Unload signal, which now immediately retracts the heads and spins down the disk, as it should. I will move on and update the controller logic to accomplish several pattern of seeks to exercise the drive, plus begin watching to see if the drive will attempt to read data from the drive.

At lunchtime, I whipped up the new controller logic and began testing. The two looping patterns keep the drive busy in seek lamp partly illuminated so I know that the commands are issued rapidly. What I don't see is the motion, so I have a flaw in my state machine or setup of the address lines.

After a short look at the Pertec documents I realized I am not meeting all the setup and stability requirements, which are a minimum of .5 us and sometimes 1 us. With an fpga that ticks along at
.02 us clock rate, I need to wait 35 to 50 cycles whenever I change lines. Another issue is the lagginess of status signals. Specifically, when you command a seek or restore, the drive may take up to 1 us before the busy signal goes on, thus my state machine thought that the drive had already finished. In fact, the shortest time even with zero movement or an addressing error is 2 us, 100 cycles.

I modified my state machines to stick in those wait durations before lunch was over and went out to test some more. First shot, didn't work. This evening I stuck in some diagnostic indications and retest (as well as looked over my logic for flaws). Even with greatly extended times, the FSM wasn't working. More seriously, once I had every valid state of the FSM lighting an indicator, I saw that it would often get lost in some invalid state - not covered by 'others' or any valid state.

I have to figure out why I have this flaky FSM behavior as that seems to be the likely source of the problem, not my drive nor the intent of my logic. This will take some study, I guess I can work out the logic to read in and decode the header section of the first sector, while I mull on the main problem.

Hooked up fpga controller to Pertec D3422 drive for testing

Today I was extremely busy with other activities, such as my job, but I was able to get in to the workshop at 5PM and get a bit done.


PERTEC D3422 DRIVE RESTORATION

I set up the testing area to verify voltages on the connectors before I hook them into my fpga board - fired up the disk drive and then probed the lines. No high voltages were present and no short circuits, so it was safe to hook the connectors to the Nexys2 fpga board and begin the next round of testing.

Connecting the 48 small connectors to the fpga board took time to get right, but by the early evening I was ready to put in some preliminary vhdl to the board and fire it up. The intent was to watch the status lights and seven segment display to see if the status appears correct.

It became obvious that the termination resistors weren't connected to +5V inside the disk drive, thus not pulled up as they should. I know which connectors on the interface card carry these voltages, so time to trace out what I need to do in order to fire up the termination properly.

I haven't finished all of the logic I wanted, but put enough in to try a few commands. Specifically, I could start or stop the drive, command it to do an emergency unload of the heads, and ask it to seek to a given cylinder or the home position. Once the terminator pullup power is in place, I can give it a try.