Help put back issues online

We are trying to scan and upload all back issues of Tanzanian Affairs (dating back to 1978. The process consists of

1) scanning an old issue to a pdf file (to produce something like this)

2) character recognition in the pdf file and exporting a text file (to produce something like this – this is a particularly poor example).

3) correcting and editing the text file (to produce something like this).

4) uploading each article onto the website.

Stage (3) is where volunteers would be most appreciated.

If anyone is kind enough to volunteer, I will email them the raw text file and scan pdf, and then they can email me back the final text file when ready (preferably within a month or so!).

Suggested Instructions for stage 3

this is how I tend to tidy up the file, but you can do it anyway you want provided the end result is the same!

a) remove additional blank spaces from the text. I can’t find a way of doing this automatically, and the best solution I have so far is using word and doing a Edit->Replace to find ” ” (ie space) and replace with “”(ie nothing). Then use Alt-R if the cursor is on a space you want to delete and Alt-F (find next space) if the cursos is on a space that you need to keep. It is still rather laborious – I find it best not to bother going back to correct mistakes, but rather leave them to stage (b)

b) second run through document, correcting any errors from (a) and correcting any obvious “spelling mistakes” from the character recognition (eg brakfng instead of braking, ERRILIANT instead of BRILLIANT, “~~uardiano”n instead of “Guardian” on). Add additional blank lines between each paragraph, and add in the titles of the articles.

c) final thorough run through, checking against the original pdf. Please check names and numbers especially, since there are often errors.

Please DON’T remove the line breaks at the end of each line, because that makes it much harder to compare against the original pdf file. I will remove them just before uploading onto the site.