Canofile hell - my clunky workaround

Background

We've had paper archiving on Canofile discs for over 20 years - and for nearly 20 years we've had Canofile data on a Pocket-Canofile disc system, where the original Canofile MO discs were copied to a CD with a reader program included.  This was clunky but basically OK in the days of DOS, all the way to Windows ME.


The problem is that they've never worked properly from Windows XP onwards (you could run the program but not see the contents of the disc), and now we've got 64-bit Windows everywhere they don't run at all.


I appreciate this is a different problem to that posed to people with actual MO discs, but for those with DOS Pocket-Canofile discs, and no copy of Canofile for Windows (and I have tried to get one), my admittedly clunky solution might just work.


Stuff I tried that didn't work

Dos boot stick / CD

This sometimes works - get FreeDOS or MS-DOS to boot from a memory stick or similar, and save whatever job you're looking for on the stick, then boot back into Windows.  The problems I had here were that the discs don't work on more modern hardware, so the problem was similar to Windows XP+ issues.  Added to that, even when it did work, there isn't a way I've found of extracting the contents of the disc automatically in DOS, which means either going through this process every time you want to retrieve a job, or someone will have to sit there and do it all manually.  With over 150 discs with between 80 and 300 records on each disc, that's going to be tedious.

DosBox, DOS or Windows 98 as a virtual machine

All non-starters.  The discs refuse to work.


So what did work, then?

What we need is an actual Windows 98 computer.  That's not something most people want sat on their network full-time, and getting hardware with drivers that actually work is tricky.  However, once you've got one, you can access the discs.  You need to get the CD-ROM drive to work reading discs, and either the network to work, or some other way of getting the exported data off the Windows 98 computer, be that a CD/DVD burner, USB mass storage support, whichever.


If you've got one of these. you can interrogate the discs on a one-off basis as I mentioned in the DOS boot stick section, but to automatically rip the whole disc, we need some severe script-fu.

AutoIt

In the end I used a scripting language called AutoIt.  It's pretty powerful, and can send keystrokes to programs and in some cases parse windows and things like that, and create dialogue boxes etc. The latest version doesn't work with Win9x systems, but version 3.2.12.1 does, and is available from http://www.autoitscript.com/autoit3/files/archive/autoit/autoit-v3.2.12.1-setup.exe

Prn2File

Unfortunately, the Canofile program uses a graphical text mode which means we can't screen-scrape to work out what the program is doing.  We can print lists of what's on the disc though, and using prn2file.com, can redirect the print for parsing by the script.  Prn2file is available here: http://homepages.rootsweb.ancestry.com/~fhcnet/fhctech/print2file.html

 

The Script

Note that the script is still fairly rough-and-ready, though I've taken it through several discs without any notable problems.  I know the code is sloppy - it is my first AutoIt script after all, so has bits from all over the place in it.  I'll update it as I refine it over the next couple of months.  Please feel free to come up with improvements!


Some working assumptions:  Your scripts and supporting files such as PRN2FILE are stored in c:\CANOFILE , and your CD-ROM drive is D:\
 
So, very roughly, the script runs like this during a full run:
 
  1. Run the Prn2File then the Canofile program and wait for it to start
  2. print a list for each side to c:\canofile\print.txt.  We need to get the left and right-hand sides of the list, so we have a file name and a number of pages to aim for.  We do this for side A and B (a relic from the MO format).
  3. Parse the list so we know how many records to process and how many pages are in each record
  4. Iterate through each record on each side and extract all the pages as Tifs.  Wait for the highest known page number, then wait a few seconds for extra pages, as the page count isn't accurate if there are duplex scans.  As we're limited to DOS 8.3 names, extract based on disc record number, all to the %temp% folder (due to an AutoIt bug with pressing Enter, which means we can't reliably set the export path to anything else)
  5. Once all the records are extracted, present a dialogue box asking how you want to name the folders with jobs in.  This is necessary because the first file field doesn't always contain the job number, and the fields can be different between side A's and side B's too.
  6. Move the files from %temp% to c:\canofile\<disc name>\<job folder><recordNum>.  We add the <recordnum> to avoid overwriting identically named jobs on the disc.
The script also has some logic for recovering a partly extracted disc, or a disc which was fully extracted but didn't quit properly, or Windows crashed or similar (working with Windows 98 again makes you realise just how unreliable it was!).  The logic isn't completely foolproof though, and any suggestions for improvements would be welcome.
 
Once running, a typical disc will take between 45 minutes and 2 hours to extract, but can be left alone while it's running, though if you're like me you'll just stare in wonder at it instead of doing something useful.  Some of the reason it takes so long is because of all the sleep times I've had to introduce - as we're effectively controlling the program blind, we need to be sure it's reached the place it should have reached.  If the disc has stopped doing anything at all for more than a minute or so, quit the Canofile program, quit the script (there will be an AutoIt icon in the system tray near the clock that you can select Exit from), then run the script again, and tell it you want to extract more Tiffs from the disc.


printrun.bat

This script runs Prn2file before launching the Canofile reader.
 
c:
cd \canofile
del c:\canofile\print.txt
prn2file.com c:\canofile\print.txt
d:
run.bat

readcano.au3

This is the main script.
;Canofile Disc Extractor
;CCEagles, 2013-2014
#include <debug.au3>
#include <guiConstantsEx.au3>
;_DebugSetup ("Check numbers")
opt("GUIDataSeparatorChar","|")
AutoItSetOption("WinDetectHiddenText", 1)
AutoItSetOption("SendKeyDelay", 500)
$rundisc=true
$runprint=true
GUICreate("Progress", 300, 100)
global $progress
$progress=GUICtrlCreateLabel("Initializing...",10,10,280)
;GUISetState(@SW_SHOW)

if FileExists("c:\windows\temp\*.TIF") = 1 and FileExists("c:\canofile\print.txt") = 1 then
 $q=MsgBox(35,"Process Disc?", "Do you want to get more Tiff files from this disc?  I detected Tiff files in the Temp location.  If the disc is completely run, click No.  If the disc didn't run completely, click Yes.  If this is a new disc, click Cancel.")
 if $q=7 then 
  $rundisc=false
  $runprint=false
 elseif $q=2 then
  $q=msgbox(36,"Delete temp files?","Do you want to delete the temporary files and start a new disc?")
  if $q<>7 then
   filedelete("c:\windows\temp\*.tif")
   $runDisc=true
   $runPrint=true
  else
   msgbox(0,"Quitting","Quitting.  Nothing else to see here.")
   exit 0
  endif
 else
  $runDisc=true
  $runPrint=false
 ; $q=MsgBox(36,"Get List?", "Do you want to get a new disc catalogue from this disc?  I detected Tiff files in the Temp location and wouldn't want to overwrite them with a different disc.")
 ; if $q=7 then $runprint=false
 endif
elseif FileExists("c:\canofile\print.txt") = 1 then
 $q=MsgBox(36,"Get List?", "Do you want to get a new disc catalogue from this disc?  I detected an existing catalogue.")
 if $q=7 then $runprint=false
endif

if $rundisc=true then
; Run the disc


if $runPrint=true then 
 Run("c:\canofile\printrun.bat ","c:\canofile\")
Else
 Run("d:\run.bat ","d:\")
EndIf

WinWaitActive("run - CDROM","",5)
sleep(8000)


if $runprint=true Then
;get new list print

;Side A List
;Get LHS of list

send("{F2}{F10}^p{F3}H")
sleep(2000)

;Get RHS of list
send("{RIGHT}{RIGHT}{RIGHT}{F3}H")
sleep(2000)

;Side B List
;Get LHS of list
send("{ESC}{F3}{F10}^p{F3}H")
sleep(2000)
;Get RHS of list
send("{RIGHT}{RIGHT}{RIGHT}{F3}H")
sleep(2000)
send("{ESC}")
endif

send("{F2}{F10}")
endif

;parse the printed output for jobs...
$f=FileOpen("c:\canofile\print.txt",0)
;$fw=FileOpen("c:\canofile\test.txt",2)
$side=0
global $numFields[2]=[0,0] ;num of fields for each side
global $currentRow[2]=[0,0] ;num of fields for each side
; fieldsArr: Side, FieldNum, startcol/endcol/name
global $fieldsArr[2][20][3]
; rowsArr: Side, Row, Column
global $rowsArr[2][1000][20]
; we expect 4 pages, the first 2 will be side A, the second 2 will be side B.  We'll check against the field record that we've got the correct page
$pageNum=0
global $pageFieldStart[5]=[0,1,0,1,0]
Global $lastPageHead=""
While 1
   Local $line = FileReadLine($f)
   If @error = -1 Then ExitLoop
   if StringMid($line,7,1)="|" Then
   ;is it a label?
   if stringLeft($line,6)=" Rec  " Then
  if $line=$lastPageHead Then
   $line=$line&FileReadLine($f)
  else
  $lastPageHead=$line
   $pageNum=$pageNum+1
   if $pageNum >2 then $side=1
   ;get the string to the right of each pipe and put it in the fields array for this side
   $pipepos=7
   while $pipepos > 0
   $newpipepos=stringinstr($line,"|",1,1,$pipepos+1)
   if $newpipepos>$pipepos then 
      $numFields[$side]=$numFields[$side]+1
      $x=$numFields[$side]
      if $pageFieldStart[$pageNum]=0 then $pageFieldStart[$pageNum]=$x
      $fieldsArr[$side][$x][0]=$pipepos+1
      $fieldsArr[$side][$x][1]=$newpipepos
      $fieldsArr[$side][$x][2]=StringMid($line,$fieldsArr[$side][$x][0],$fieldsArr[$side][$x][1]-$fieldsArr[$side][$x][0])
      while StringRight($fieldsArr[$side][$x][2],1)=" "
      $fieldsArr[$side][$x][2]=stringleft($fieldsArr[$side][$x][2],StringLen($fieldsArr[$side][$x][2])-1)
      WEnd
      
   EndIf
   $pipepos=$newpipepos
   WEnd
   ; read the following line and get the ends of the field names where they exist
   Local $line = FileReadLine($f)
   for $x=$pageFieldStart[$pageNum] to $numFields[$side]
   $fieldsArr[$side][$x][2]=$fieldsArr[$side][$x][2] & " " & StringMid($line,$fieldsArr[$side][$x][0],$fieldsArr[$side][$x][1]-$fieldsArr[$side][$x][0])
   while StringRight($fieldsArr[$side][$x][2],1)=" "
      $fieldsArr[$side][$x][2]=Stringleft($fieldsArr[$side][$x][2],StringLen($fieldsArr[$side][$x][2])-1)
   WEnd
  Next
  endif
   Else
   ; line must be a data row
   ; Check the side is correct
   if (stringright($line,1)="A" and $pageNum>2) or (stringright($line,1)="B" and $pageNum<=2) Then
   msgbox(0,"Whoops",stringright($line,1)&", "& $pageNum & " - This line belongs to the wrong side of the disc and cannot be imported.")
   endIf
   $currentRow[$side]=stringleft($line,6)
   $currentRow[$side]=trim($currentRow[$side])
   
  if stringLeft($currentRow[$side],1)="0" then $currentRow[$side]="1"&$currentRow[$side]
   $y=$currentRow[$side]
   for $x=$pageFieldStart[$pageNum] to $numFields[$side]
   $rowsArr[$side][$y][$x]=StringMid($line,$fieldsArr[$side][$x][0],$fieldsArr[$side][$x][1]-$fieldsArr[$side][$x][0])
   while StringRight($fieldsArr[$side][$x][2],1)=" "
      $rowsArr[$side][$y][$x]=Stringleft($rowsArr[$side][$y][$x],$rowsArr[$side][$y][$x])-1)
   WEnd
   while StringLeft($fieldsArr[$side][$x][2],1)=" "
      $rowsArr[$side][$y][$x]=StringRight($rowsArr[$side][$y][$x],$rowsArr[$side][$y][$x])-1)
   WEnd
   Next
   EndIf
   EndIf
 
WEnd
FileClose($f)

if $rundisc=true then
for $s=0 to 1
 $pagesField=getFieldNum($s,"No.of Pages")
 if $s=0 Then
  $sideName="A"
 Else
  $sideName="B"
  send("{ESC}{F3}{F10}")
 endif
 for $r=1 to $currentRow[$s]
  $n=StringReplace($rowsArr[$s][$r][$pagesField]," ","")
  $l=StringLen($n&$s&$r)
  $pz="-"
  $z=""
  for $pad=$l to 7
   $z=$z&$pz
   $pz="0"
  Next
  $lastFileName=$sideName&$r&$z&$n&".tif"
  $recpos=$r & " of " & $currentRow[$s]-1
  if FileExists("C:\WINDOWS\TEMP\"&$lastFileName) = 0 then
   send("P{F3}E"&$sideName&$r&"-{F10}")
   $nf=$numFields[$s]
   $c=""
   for $b=0 to 9
    $c=$c&$b&": "&$rowsArr[$s][$r][$b]
   Next
   while FileExists("C:\WINDOWS\TEMP\"&$lastFileName) = 0
    sleep(2000)
   WEnd

   ;But wait!  Are there more files?
   $lastFilePathExists=FileExists("C:\WINDOWS\TEMP\"&$lastFileName)
   $extra=$n
   while  $lastFilePathExists = 1
    if StringLen($extra+1)>stringLen($extra) then $z=StringLeft($z,stringLen($z)-1)
    $extra=$extra+1
    $lastFileName=$sideName&$r&$z&$extra&".tif"
    $lastFilePathExists=FileExists("C:\WINDOWS\TEMP\"&$lastFileName)
    if $lastFilePathExists = 0 then 
     sleep(3000)
     $lastFilePathExists=FileExists("C:\WINDOWS\TEMP\"&$lastFileName)
    endif
   WEnd
   sleep(1000)
   if Mod($r,20)=0 Then
    send("D{pgdn}")
   Else
    send("{UP}D")
   endif
  Else
   if Mod($r,20)=0 Then
    send("{pgdn}")
   Else
    send("{DOWN}")
   endif
  endif
 Next
Next


;Exit the batch window
sleep(3000)
send("{ESC}{ESC}X")
sleep(5000)
send("^C")

;end of Canofile operations
endif
; prompt for saving options
Global $jobAName1Sel
Global $jobAName2Sel
Global $jobAName3Sel
Global $jobBName1Sel
Global $jobBName2Sel
Global $jobBName3Sel
Global $discName
CreateWindow()

;Iterate Side names
GUISetState(@SW_SHOW)
for $s=0 to 1
 if $s=0 Then
  $sideName="A"
 else
  $sideName="B"
 endif

 ;Iterate rows on each side
 for $r=1 to $currentRow[$s]
  ;Determine appropriate job name
  $filestring=""
  if $s=1 then
   if $JobAName1Sel <> "<none>" then
    $b=StringLeft($JobAName1Sel,stringInstr($JobAName1Sel,".")-1)
    $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " "
   endif
   if $JobAName2Sel <> "<none>" then
    $b=StringLeft($JobAName2Sel,stringInstr($JobAName2Sel,".")-1)
    $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " "
   endif
   if $JobAName3Sel <> "<none>" then
    $b=StringLeft($JobAName3Sel,stringInstr($JobAName3Sel,".")-1)
    $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " "
   endif
  else
   if $JobBName1Sel <> "<none>>" then
    $b=StringLeft($JobBName1Sel,stringInstr($JobBName1Sel,".")-1)
    $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " "
   endif
   if $JobBName2Sel <> "<none>" then
    $b=StringLeft($JobBName2Sel,stringInstr($JobBName2Sel,".")-1)
    $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " "
   endif
   if $JobBName3Sel <> "<none>" then
    $b=StringLeft($JobBName3Sel,stringInstr($JobBName3Sel,".")-1)
    $filestring=$filestring&trim($rowsArr[$s][$r][$b]) & " "
   endif
  endif
  $filestring=stringleft($filestring,128) & $sidename & $r
  $filestring=StringReplace($filestring,"\","-")
  $filestring=StringReplace($filestring,"/","-")
  $filestring=StringReplace($filestring,":","-")
  $filestring=StringReplace($filestring,'"',"-")
  if $discName <> "" then $fileString=$discName & "\" & $fileString
  ;find files in Temp and move them
  $pagesField=getFieldNum($s,"No.of Pages")
  $n=trim($rowsArr[$s][$r][$pagesField])
  if $n>=1000 and $r>=100 Then
   $lastTif=$sidename & $r &$n&".TIF"
  Else
   $lastTif=$sidename & $r & "-"
   for $i=stringlen($n)+stringLen($r) to 5
    $lastTif=$lastTif & "0"
   Next
   $lastTif=$lastTif&$n&".TIF"
  endif
  $filesFound=FileFindFirstFile ("c:\WINDOWS\TEMP\"&$lastTif)
  if $filesFound<>-1 Then
   GUICtrlSetData($progress,$filestring & "\" & $lastTif)
   if $r>=100 and $n>=1000 then
    ;msgbox(0,"Over 1000",$filestring&" will contain "&$n&" records")
    $fresult=filemove ("c:\WINDOWS\TEMP\" & $sidename & $r & "*.TIF","c:\canofile\" & $filestring & "\",8)
   Else
    $fresult=filemove ("c:\WINDOWS\TEMP\" & $sidename & $r & "-*.TIF","c:\canofile\" & $filestring & "\",8)
   endif
   if $fresult=0 then msgbox(1,"Whoops","Couldn't move files to "&$filestring &" . Please manually create a folder under c:\canofile\ to place these files in, then move the files c:\windows\temp\"$sidename & $r & "*.TIF manually, before clicking OK.")
  Else
   GUICtrlSetData($progress,"Skipped "&$filestring& " as c:\WINDOWS\TEMP\"&$lastTif & " has already been moved.")
   sleep(200)
  endif
 Next
Next
GUIDelete()
MsgBox(0,"Complete","Disc processing is complete.")

func trim($s)
 $t=$s
 while stringleft($t,1)=" "
  ;sleep (200)
  $t=stringRight($t,stringlen($t)-1)
  ;GUICtrlSetData($progress,"left trim before: '"&$s & "' after:'"&$t & "'")
 WEnd
 while stringright($t,1)=" "
  ;sleep (200)
  $t=stringleft($t,stringlen($t)-1)
  ;GUICtrlSetData($progress,"right trim before: '"&$s & "' after:'"&$t & "'")
 WEnd
 return($t)
EndFunc

func getFieldNum($s,$ft)
 ; fieldsArr: Side, FieldNum, startcol/endcol/name
 ;global $fieldsArr[2][20][3]
 for $i=0 to $numFields[$s]
  if $ft=$fieldsArr[$s][$i][2] then Return($i)
 Next
 return(-1)
EndFunc

func CreateWindow()
   ;Initialize variables
   Local $GUIWidth = 500, $GUIHeight = 250
   Local $Edit_1, $OK_Btn, $Cancel_Btn, $msg

   #forceref $Edit_1

   ;Create window
   GUICreate("Select file format", $GUIWidth, $GUIHeight)

   ;side A
   ;Combos
   ; generate list of fields for side A
   $fieldListA="|";
   for $i=1 to $numFields[0]
   $fieldListA=$fieldListA & "|" & $i&". "&$fieldsArr[0][$i][2]
   if $fieldsArr[0][$i][2]="No.of Pages" then $noOfPagesFieldA=$i
   Next

   $jobAName1=GUICtrlCreateCombo ( "",10, 90 , 200)
   GuiCtrlSetData($jobAName1,$fieldListA,"")
   
   $jobAName2=GUICtrlCreateCombo ( "",10, 130 , 200)
   GuiCtrlSetData($jobAName2,$fieldListA,"")
   
   $jobAName3=GUICtrlCreateCombo ( "",10, 170 , 200)
   GuiCtrlSetData($jobAName3,$fieldListA,"")
   
 ;side B
   ;Combos
   ; generate list of fields for side B
   $fieldListB="|";
   for $i=1 to $numFields[1]
   $fieldListB=$fieldListB & "|" & $i&". "&$fieldsArr[1][$i][2]
   if $fieldsArr[1][$i][2]="No.of Pages" then $noOfPagesFieldB=$i
   Next

   $jobBName1=GUICtrlCreateCombo ( "",220, 90 , 200)
   GuiCtrlSetData($jobBName1,$fieldListB,"")
   
   $jobBName2=GUICtrlCreateCombo ( "",220, 130 , 200)
   GuiCtrlSetData($jobBName2,$fieldListB,"")
   
   $jobBName3=GUICtrlCreateCombo ( "",220, 170 , 200)
   GuiCtrlSetData($jobBName3,$fieldListB,"")
   
   $discName=GUICtrlCreateInput( "",10, 50 , 200)
   GuiCtrlSetData($jobAName1,$fieldListA,"")
   GUICtrlCreateLabel("Side A:"&$currentRow[0]&" records",10,5,200,14)
   GUICtrlCreateLabel("Disc Name:",10,35,200,14)
   GUICtrlCreateLabel("File Name part 1:",10,75,200,14)
   GUICtrlCreateLabel("File Name part 2:",10,115,200,14)
   GUICtrlCreateLabel("File Name part 3:",10,155,200,14)
   GUICtrlCreateLabel("Side B:"&$currentRow[1]&" records",220,5,100,14)
   
   ;Create an "OK" button
   $OK_Btn = GUICtrlCreateButton("OK", 75, 210, 70, 25)

   ;Create a "CANCEL" button
   $Cancel_Btn = GUICtrlCreateButton("Cancel", 165, 210, 70, 25)

   ;Show window/Make the window visible
   GUISetState()
   GUISetState(@SW_SHOW)

   ;Loop until:
   ;- user presses Esc
   ;- user presses Alt+F4
   ;- user clicks the close button
   $l=1
   While $l
   ;After every loop check if the user clicked something in the GUI window
   $msg = GUIGetMsg()

   Select

   ;Check if user clicked on the close button
   Case $msg = $GUI_EVENT_CLOSE
   ;Destroy the GUI including the controls
   GUIDelete()
   ;Exit the script
   Exit

   ;Check if user clicked on the "OK" button
  Case $msg = $OK_Btn
  $discName = GUICtrlRead($discName, 1)
  $jobAName1Sel = GUICtrlRead($jobAName1, 1)
  $jobAName2Sel = GUICtrlRead($jobAName2, 1)
  $jobAName3Sel = GUICtrlRead($jobAName3, 1)
  $jobBName1Sel = GUICtrlRead($jobBName1, 1)
  $jobBName2Sel = GUICtrlRead($jobBName2, 1)
  $jobBName3Sel = GUICtrlRead($jobBName3, 1)
   
   GUIDelete()
   $l=0
   ;Check if user clicked on the "CANCEL" button
   Case $msg = $Cancel_Btn
   ;Destroy the GUI including the controls

   ;Exit the script
   Exit
  EndSelect

 WEnd  
EndFunc

Some final notes

I've found that a CD-ROM drive seems to be a lot more reliable that a DVD-ROM drive for reading the disc we've got, which are between 18 and 22 years old now!  It might be a good CD-ROM and bad couple of DVD drives, but that's my experience.

I've now extracted about 13 discs with this system - still another 140 or so to go, but it's steady enough for me to work with now.  If you do have any tweaks or suggestions, feel free to comment!

Also, if you've got a better way of doing this, such as being able to parse the monolithic files on the discs themselves, or using another program I've not heard of, I'm all ears.

Very finally, I've never had an MO drive to play with, and we only have one MO disc in the office.  I'm not sure what format the discs are in, but on our CDs we have 2 monolith files called A and B, and some supporting files called 1.A, 1.B, 2.A, 2.B etc., which appear to be index files of between 1 and 50K each.  If you can pull such a set of files from an MO disc, we might be able to pull the data from the MO discs.  There might be nothing in it, but I'm almost tempted to track down an MO drive just to see if I can read this one disc!

1 comment:

  1. Probably too late given date of your post but I worked for the company who Wrote the pocket Canofile software. The A and B files are sector dumps of both sides of the source MO discs and the numbered files are indexes created to make searching quicker in the viewer, they don't exist on the MO discs. The viewer detects if it's being run from a CD as a crude security device but there was a secret commandline option to bypass it (from memory it was possibly /V:driveletter so if on D: it's /V:D) which may allow the discs to be used on dosbox. If there's an option to make Dosbox see a hard drive directory as a CD that may also allow it to run there.

    ReplyDelete