Background
We've had paper archiving on Canofile discs for over 20 years - and for nearly 20 years we've had Canofile data on a Pocket-Canofile disc system, where the original Canofile MO discs were copied to a CD with a reader program included. This was clunky but basically OK in the days of DOS, all the way to Windows ME.
I appreciate this is a different problem to that posed to people with actual MO discs, but for those with DOS Pocket-Canofile discs, and no copy of Canofile for Windows (and I have tried to get one), my admittedly clunky solution might just work.
Stuff I tried that didn't work
Dos boot stick / CD
This sometimes works - get FreeDOS or MS-DOS to boot from a memory stick or similar, and save whatever job you're looking for on the stick, then boot back into Windows. The problems I had here were that the discs don't work on more modern hardware, so the problem was similar to Windows XP+ issues. Added to that, even when it did work, there isn't a way I've found of extracting the contents of the disc automatically in DOS, which means either going through this process every time you want to retrieve a job, or someone will have to sit there and do it all manually. With over 150 discs with between 80 and 300 records on each disc, that's going to be tedious.DosBox, DOS or Windows 98 as a virtual machine
All non-starters. The discs refuse to work.So what did work, then?
What we need is an actual Windows 98 computer. That's not something most people want sat on their network full-time, and getting hardware with drivers that actually work is tricky. However, once you've got one, you can access the discs. You need to get the CD-ROM drive to work reading discs, and either the network to work, or some other way of getting the exported data off the Windows 98 computer, be that a CD/DVD burner, USB mass storage support, whichever.If you've got one of these. you can interrogate the discs on a one-off basis as I mentioned in the DOS boot stick section, but to automatically rip the whole disc, we need some severe script-fu.
AutoIt
In the end I used a scripting language called AutoIt. It's pretty powerful, and can send keystrokes to programs and in some cases parse windows and things like that, and create dialogue boxes etc. The latest version doesn't work with Win9x systems, but version 3.2.12.1 does, and is available from http://www.autoitscript.com/autoit3/files/archive/autoit/autoit-v3.2.12.1-setup.exePrn2File
Unfortunately, the Canofile program uses a graphical text mode which means we can't screen-scrape to work out what the program is doing. We can print lists of what's on the disc though, and using prn2file.com, can redirect the print for parsing by the script. Prn2file is available here: http://homepages.rootsweb.ancestry.com/~fhcnet/fhctech/print2file.htmlThe Script
Note that the script is still fairly rough-and-ready, though I've taken it through several discs without any notable problems. I know the code is sloppy - it is my first AutoIt script after all, so has bits from all over the place in it. I'll update it as I refine it over the next couple of months. Please feel free to come up with improvements!Some working assumptions: Your scripts and supporting files such as PRN2FILE are stored in c:\CANOFILE , and your CD-ROM drive is D:\
- Run the Prn2File then the Canofile program and wait for it to start
- print a list for each side to c:\canofile\print.txt. We need to get the left and right-hand sides of the list, so we have a file name and a number of pages to aim for. We do this for side A and B (a relic from the MO format).
- Parse the list so we know how many records to process and how many pages are in each record
- Iterate through each record on each side and extract all the pages as Tifs. Wait for the highest known page number, then wait a few seconds for extra pages, as the page count isn't accurate if there are duplex scans. As we're limited to DOS 8.3 names, extract based on disc record number, all to the %temp% folder (due to an AutoIt bug with pressing Enter, which means we can't reliably set the export path to anything else)
- Once all the records are extracted, present a dialogue box asking how you want to name the folders with jobs in. This is necessary because the first file field doesn't always contain the job number, and the fields can be different between side A's and side B's too.
- Move the files from %temp% to c:\canofile\<disc name>\<job folder><recordNum>. We add the <recordnum> to avoid overwriting identically named jobs on the disc.
printrun.bat
This script runs Prn2file before launching the Canofile reader.c: cd \canofile del c:\canofile\print.txt prn2file.com c:\canofile\print.txt d: run.bat
readcano.au3
This is the main script.;Canofile Disc Extractor ;CCEagles, 2013-2014 #include <debug.au3> #include <guiConstantsEx.au3> ;_DebugSetup ("Check numbers") opt("GUIDataSeparatorChar","|") AutoItSetOption("WinDetectHiddenText", 1) AutoItSetOption("SendKeyDelay", 500) $rundisc=true $runprint=true GUICreate("Progress", 300, 100) global $progress $progress=GUICtrlCreateLabel("Initializing...",10,10,280) ;GUISetState(@SW_SHOW) if FileExists("c:\windows\temp\*.TIF") = 1 and FileExists("c:\canofile\print.txt") = 1 then $q=MsgBox(35,"Process Disc?", "Do you want to get more Tiff files from this disc? I detected Tiff files in the Temp location. If the disc is completely run, click No. If the disc didn't run completely, click Yes. If this is a new disc, click Cancel.") if $q=7 then $rundisc=false $runprint=false elseif $q=2 then $q=msgbox(36,"Delete temp files?","Do you want to delete the temporary files and start a new disc?") if $q<>7 then filedelete("c:\windows\temp\*.tif") $runDisc=true $runPrint=true else msgbox(0,"Quitting","Quitting. Nothing else to see here.") exit 0 endif else $runDisc=true $runPrint=false ; $q=MsgBox(36,"Get List?", "Do you want to get a new disc catalogue from this disc? I detected Tiff files in the Temp location and wouldn't want to overwrite them with a different disc.") ; if $q=7 then $runprint=false endif elseif FileExists("c:\canofile\print.txt") = 1 then $q=MsgBox(36,"Get List?", "Do you want to get a new disc catalogue from this disc? I detected an existing catalogue.") if $q=7 then $runprint=false endif if $rundisc=true then ; Run the disc if $runPrint=true then Run("c:\canofile\printrun.bat ","c:\canofile\") Else Run("d:\run.bat ","d:\") EndIf WinWaitActive("run - CDROM","",5) sleep(8000) if $runprint=true Then ;get new list print ;Side A List ;Get LHS of list send("{F2}{F10}^p{F3}H") sleep(2000) ;Get RHS of list send("{RIGHT}{RIGHT}{RIGHT}{F3}H") sleep(2000) ;Side B List ;Get LHS of list send("{ESC}{F3}{F10}^p{F3}H") sleep(2000) ;Get RHS of list send("{RIGHT}{RIGHT}{RIGHT}{F3}H") sleep(2000) send("{ESC}") endif send("{F2}{F10}") endif ;parse the printed output for jobs... $f=FileOpen("c:\canofile\print.txt",0) ;$fw=FileOpen("c:\canofile\test.txt",2) $side=0 global $numFields[2]=[0,0] ;num of fields for each side global $currentRow[2]=[0,0] ;num of fields for each side ; fieldsArr: Side, FieldNum, startcol/endcol/name global $fieldsArr[2][20][3] ; rowsArr: Side, Row, Column global $rowsArr[2][1000][20] ; we expect 4 pages, the first 2 will be side A, the second 2 will be side B. We'll check against the field record that we've got the correct page $pageNum=0 global $pageFieldStart[5]=[0,1,0,1,0] Global $lastPageHead="" While 1 Local $line = FileReadLine($f) If @error = -1 Then ExitLoop if StringMid($line,7,1)="|" Then ;is it a label? if stringLeft($line,6)=" Rec " Then if $line=$lastPageHead Then $line=$line&FileReadLine($f) else $lastPageHead=$line $pageNum=$pageNum+1 if $pageNum >2 then $side=1 ;get the string to the right of each pipe and put it in the fields array for this side $pipepos=7 while $pipepos > 0 $newpipepos=stringinstr($line,"|",1,1,$pipepos+1) if $newpipepos>$pipepos then $numFields[$side]=$numFields[$side]+1 $x=$numFields[$side] if $pageFieldStart[$pageNum]=0 then $pageFieldStart[$pageNum]=$x $fieldsArr[$side][$x][0]=$pipepos+1 $fieldsArr[$side][$x][1]=$newpipepos $fieldsArr[$side][$x][2]=StringMid($line,$fieldsArr[$side][$x][0],$fieldsArr[$side][$x][1]-$fieldsArr[$side][$x][0]) while StringRight($fieldsArr[$side][$x][2],1)=" " $fieldsArr[$side][$x][2]=stringleft($fieldsArr[$side][$x][2],StringLen($fieldsArr[$side][$x][2])-1) WEnd EndIf $pipepos=$newpipepos WEnd ; read the following line and get the ends of the field names where they exist Local $line = FileReadLine($f) for $x=$pageFieldStart[$pageNum] to $numFields[$side] $fieldsArr[$side][$x][2]=$fieldsArr[$side][$x][2] & " " & StringMid($line,$fieldsArr[$side][$x][0],$fieldsArr[$side][$x][1]-$fieldsArr[$side][$x][0]) while StringRight($fieldsArr[$side][$x][2],1)=" " $fieldsArr[$side][$x][2]=Stringleft($fieldsArr[$side][$x][2],StringLen($fieldsArr[$side][$x][2])-1) WEnd Next endif Else ; line must be a data row ; Check the side is correct if (stringright($line,1)="A" and $pageNum>2) or (stringright($line,1)="B" and $pageNum<=2) Then msgbox(0,"Whoops",stringright($line,1)&", "& $pageNum & " - This line belongs to the wrong side of the disc and cannot be imported.") endIf $currentRow[$side]=stringleft($line,6) $currentRow[$side]=trim($currentRow[$side]) if stringLeft($currentRow[$side],1)="0" then $currentRow[$side]="1"&$currentRow[$side] $y=$currentRow[$side] for $x=$pageFieldStart[$pageNum] to $numFields[$side] $rowsArr[$side][$y][$x]=StringMid($line,$fieldsArr[$side][$x][0],$fieldsArr[$side][$x][1]-$fieldsArr[$side][$x][0]) while StringRight($fieldsArr[$side][$x][2],1)=" " $rowsArr[$side][$y][$x]=Stringleft($rowsArr[$side][$y][$x],$rowsArr[$side][$y][$x])-1) WEnd while StringLeft($fieldsArr[$side][$x][2],1)=" " $rowsArr[$side][$y][$x]=StringRight($rowsArr[$side][$y][$x],$rowsArr[$side][$y][$x])-1) WEnd Next EndIf EndIf WEnd FileClose($f) if $rundisc=true then for $s=0 to 1 $pagesField=getFieldNum($s,"No.of Pages") if $s=0 Then $sideName="A" Else $sideName="B" send("{ESC}{F3}{F10}") endif for $r=1 to $currentRow[$s] $n=StringReplace($rowsArr[$s][$r][$pagesField]," ","") $l=StringLen($n&$s&$r) $pz="-" $z="" for $pad=$l to 7 $z=$z&$pz $pz="0" Next $lastFileName=$sideName&$r&$z&$n&".tif" $recpos=$r & " of " & $currentRow[$s]-1 if FileExists("C:\WINDOWS\TEMP\"&$lastFileName) = 0 then send("P{F3}E"&$sideName&$r&"-{F10}") $nf=$numFields[$s] $c="" for $b=0 to 9 $c=$c&$b&": "&$rowsArr[$s][$r][$b] Next while FileExists("C:\WINDOWS\TEMP\"&$lastFileName) = 0 sleep(2000) WEnd ;But wait! Are there more files? $lastFilePathExists=FileExists("C:\WINDOWS\TEMP\"&$lastFileName) $extra=$n while $lastFilePathExists = 1 if StringLen($extra+1)>stringLen($extra) then $z=StringLeft($z,stringLen($z)-1) $extra=$extra+1 $lastFileName=$sideName&$r&$z&$extra&".tif" $lastFilePathExists=FileExists("C:\WINDOWS\TEMP\"&$lastFileName) if $lastFilePathExists = 0 then sleep(3000) $lastFilePathExists=FileExists("C:\WINDOWS\TEMP\"&$lastFileName) endif WEnd sleep(1000) if Mod($r,20)=0 Then send("D{pgdn}") Else send("{UP}D") endif Else if Mod($r,20)=0 Then send("{pgdn}") Else send("{DOWN}") endif endif Next Next ;Exit the batch window sleep(3000) send("{ESC}{ESC}X") sleep(5000) send("^C") ;end of Canofile operations endif ; prompt for saving options Global $jobAName1Sel Global $jobAName2Sel Global $jobAName3Sel Global $jobBName1Sel Global $jobBName2Sel Global $jobBName3Sel Global $discName CreateWindow() ;Iterate Side names GUISetState(@SW_SHOW) for $s=0 to 1 if $s=0 Then $sideName="A" else $sideName="B" endif ;Iterate rows on each side for $r=1 to $currentRow[$s] ;Determine appropriate job name $filestring="" if $s=1 then if $JobAName1Sel <> "<none>" then $b=StringLeft($JobAName1Sel,stringInstr($JobAName1Sel,".")-1) $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " " endif if $JobAName2Sel <> "<none>" then $b=StringLeft($JobAName2Sel,stringInstr($JobAName2Sel,".")-1) $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " " endif if $JobAName3Sel <> "<none>" then $b=StringLeft($JobAName3Sel,stringInstr($JobAName3Sel,".")-1) $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " " endif else if $JobBName1Sel <> "<none>>" then $b=StringLeft($JobBName1Sel,stringInstr($JobBName1Sel,".")-1) $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " " endif if $JobBName2Sel <> "<none>" then $b=StringLeft($JobBName2Sel,stringInstr($JobBName2Sel,".")-1) $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " " endif if $JobBName3Sel <> "<none>" then $b=StringLeft($JobBName3Sel,stringInstr($JobBName3Sel,".")-1) $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " " endif endif $filestring=stringleft($filestring,128) & $sidename & $r $filestring=StringReplace($filestring,"\","-") $filestring=StringReplace($filestring,"/","-") $filestring=StringReplace($filestring,":","-") $filestring=StringReplace($filestring,'"',"-") if $discName <> "" then $fileString=$discName & "\" & $fileString ;find files in Temp and move them $pagesField=getFieldNum($s,"No.of Pages") $n=trim($rowsArr[$s][$r][$pagesField]) if $n>=1000 and $r>=100 Then $lastTif=$sidename & $r &$n&".TIF" Else $lastTif=$sidename & $r & "-" for $i=stringlen($n)+stringLen($r) to 5 $lastTif=$lastTif & "0" Next $lastTif=$lastTif&$n&".TIF" endif $filesFound=FileFindFirstFile ("c:\WINDOWS\TEMP\"&$lastTif) if $filesFound<>-1 Then GUICtrlSetData($progress,$filestring & "\" & $lastTif) if $r>=100 and $n>=1000 then ;msgbox(0,"Over 1000",$filestring&" will contain "&$n&" records") $fresult=filemove ("c:\WINDOWS\TEMP\" & $sidename & $r & "*.TIF","c:\canofile\" & $filestring & "\",8) Else $fresult=filemove ("c:\WINDOWS\TEMP\" & $sidename & $r & "-*.TIF","c:\canofile\" & $filestring & "\",8) endif if $fresult=0 then msgbox(1,"Whoops","Couldn't move files to "&$filestring &" . Please manually create a folder under c:\canofile\ to place these files in, then move the files c:\windows\temp\"$sidename & $r & "*.TIF manually, before clicking OK.") Else GUICtrlSetData($progress,"Skipped "&$filestring& " as c:\WINDOWS\TEMP\"&$lastTif & " has already been moved.") sleep(200) endif Next Next GUIDelete() MsgBox(0,"Complete","Disc processing is complete.") func trim($s) $t=$s while stringleft($t,1)=" " ;sleep (200) $t=stringRight($t,stringlen($t)-1) ;GUICtrlSetData($progress,"left trim before: '"&$s & "' after:'"&$t & "'") WEnd while stringright($t,1)=" " ;sleep (200) $t=stringleft($t,stringlen($t)-1) ;GUICtrlSetData($progress,"right trim before: '"&$s & "' after:'"&$t & "'") WEnd return($t) EndFunc func getFieldNum($s,$ft) ; fieldsArr: Side, FieldNum, startcol/endcol/name ;global $fieldsArr[2][20][3] for $i=0 to $numFields[$s] if $ft=$fieldsArr[$s][$i][2] then Return($i) Next return(-1) EndFunc func CreateWindow() ;Initialize variables Local $GUIWidth = 500, $GUIHeight = 250 Local $Edit_1, $OK_Btn, $Cancel_Btn, $msg #forceref $Edit_1 ;Create window GUICreate("Select file format", $GUIWidth, $GUIHeight) ;side A ;Combos ; generate list of fields for side A $fieldListA="|"; for $i=1 to $numFields[0] $fieldListA=$fieldListA & "|" & $i&". "&$fieldsArr[0][$i][2] if $fieldsArr[0][$i][2]="No.of Pages" then $noOfPagesFieldA=$i Next $jobAName1=GUICtrlCreateCombo ( " ",10, 90 , 200) GuiCtrlSetData($jobAName1,$fieldListA," ") $jobAName2=GUICtrlCreateCombo ( " ",10, 130 , 200) GuiCtrlSetData($jobAName2,$fieldListA," ") $jobAName3=GUICtrlCreateCombo ( " ",10, 170 , 200) GuiCtrlSetData($jobAName3,$fieldListA," ") ;side B ;Combos ; generate list of fields for side B $fieldListB="| "; for $i=1 to $numFields[1] $fieldListB=$fieldListB & "|" & $i&". "&$fieldsArr[1][$i][2] if $fieldsArr[1][$i][2]="No.of Pages" then $noOfPagesFieldB=$i Next $jobBName1=GUICtrlCreateCombo ( "",220, 90 , 200) GuiCtrlSetData($jobBName1,$fieldListB," ") $jobBName2=GUICtrlCreateCombo ( "",220, 130 , 200) GuiCtrlSetData($jobBName2,$fieldListB," ") $jobBName3=GUICtrlCreateCombo ( "",220, 170 , 200) GuiCtrlSetData($jobBName3,$fieldListB," ") $discName=GUICtrlCreateInput( "",10, 50 , 200) GuiCtrlSetData($jobAName1,$fieldListA," ") GUICtrlCreateLabel("Side A:"&$currentRow[0]&" records",10,5,200,14) GUICtrlCreateLabel("Disc Name:",10,35,200,14) GUICtrlCreateLabel("File Name part 1:",10,75,200,14) GUICtrlCreateLabel("File Name part 2:",10,115,200,14) GUICtrlCreateLabel("File Name part 3:",10,155,200,14) GUICtrlCreateLabel("Side B:"&$currentRow[1]&" records",220,5,100,14) ;Create an "OK" button $OK_Btn = GUICtrlCreateButton("OK", 75, 210, 70, 25) ;Create a "CANCEL" button $Cancel_Btn = GUICtrlCreateButton("Cancel", 165, 210, 70, 25) ;Show window/Make the window visible GUISetState() GUISetState(@SW_SHOW) ;Loop until: ;- user presses Esc ;- user presses Alt+F4 ;- user clicks the close button $l=1 While $l ;After every loop check if the user clicked something in the GUI window $msg = GUIGetMsg() Select ;Check if user clicked on the close button Case $msg = $GUI_EVENT_CLOSE ;Destroy the GUI including the controls GUIDelete() ;Exit the script Exit ;Check if user clicked on the "OK" button Case $msg = $OK_Btn $discName = GUICtrlRead($discName, 1) $jobAName1Sel = GUICtrlRead($jobAName1, 1) $jobAName2Sel = GUICtrlRead($jobAName2, 1) $jobAName3Sel = GUICtrlRead($jobAName3, 1) $jobBName1Sel = GUICtrlRead($jobBName1, 1) $jobBName2Sel = GUICtrlRead($jobBName2, 1) $jobBName3Sel = GUICtrlRead($jobBName3, 1) GUIDelete() $l=0 ;Check if user clicked on the "CANCEL" button Case $msg = $Cancel_Btn ;Destroy the GUI including the controls ;Exit the script Exit EndSelect WEnd EndFunc
Some final notes
I've found that a CD-ROM drive seems to be a lot more reliable that a DVD-ROM drive for reading the disc we've got, which are between 18 and 22 years old now! It might be a good CD-ROM and bad couple of DVD drives, but that's my experience.I've now extracted about 13 discs with this system - still another 140 or so to go, but it's steady enough for me to work with now. If you do have any tweaks or suggestions, feel free to comment!
Also, if you've got a better way of doing this, such as being able to parse the monolithic files on the discs themselves, or using another program I've not heard of, I'm all ears.
Very finally, I've never had an MO drive to play with, and we only have one MO disc in the office. I'm not sure what format the discs are in, but on our CDs we have 2 monolith files called A and B, and some supporting files called 1.A, 1.B, 2.A, 2.B etc., which appear to be index files of between 1 and 50K each. If you can pull such a set of files from an MO disc, we might be able to pull the data from the MO discs. There might be nothing in it, but I'm almost tempted to track down an MO drive just to see if I can read this one disc!
Probably too late given date of your post but I worked for the company who Wrote the pocket Canofile software. The A and B files are sector dumps of both sides of the source MO discs and the numbered files are indexes created to make searching quicker in the viewer, they don't exist on the MO discs. The viewer detects if it's being run from a CD as a crude security device but there was a secret commandline option to bypass it (from memory it was possibly /V:driveletter so if on D: it's /V:D) which may allow the discs to be used on dosbox. If there's an option to make Dosbox see a hard drive directory as a CD that may also allow it to run there.
ReplyDelete