Wednesday, August 16, 2017

More code conversion and programming for the UofM card archiving task

UNIVERSITY OF MICHIGAN CARD DECK ARCHIVING

I had volunteered to bring my Documation card reader and a copy of Brian Knittel's PC interface to the museum and archive almost 80 decks of cards that represent some experimental data from 1970. I captured an exact image of every card, lossless because it doesn't assume any particular encoding just captures all 960 hole positions per card.

I sent them to U of M along with the Deckview program (also from Brian) to allow them to look at all the card images. I described the file format, which is a binary file with a 16 bit word per column, thus 160 bytes per image. The column image has rows 12, 11, 0, 1, through 9 in the leftmost 12 bits and 0000 as padding.

I received an email asking why they aren't in Excel or CVS (sic) format as that is how they want them. I guess they don't have any programmers available there, so I dashed together a quick Python program to deal with the decks.

Almost every deck has a header card, which has a pattern of holes punched that forms big letters M, T and S each spanning six columns. The holes themselves that make up these characters are almost never a valid character. There is some other data to the side, including the MTS Job Number that can be tied back to the listings that wrapped the decks.

First card as captured
The remaining cards are numeric characters or spaces in Hollerith , apparently (by inspection) these are four column numbers, 20 to a card, right justified. Thus the number 639 would appear as " 638" while the number 1015 would be "1015" in its four columns.

Second and all subsequent cards seem to be 20 four-column integers

I really can't tell if these are indeed four column integers (what Fortran programs would code as an I4 format) or some other division into fields, thus dividing these into 20 fields by commas is pure guessword.

My quick program changed all non-numeric, non blank characters to asterisk, which only occur on the header card. The other cards had a comma character added after each number except for the last on a card. Viola, Comma Separated Values (CSV) that can be ready by Excel.

I hope this is the last I hear of this effort - a couple of days reading cards, each twice to ensure the data was captured correctly, plus all the dialog and now the data conversion.


No comments:

Post a Comment