Technical Notes for HenrysRecords.org

The code that runs HenrysRecords.org is under version control (using Subversion) at http://svn.red-bean.com/repos/henrysrecords/trunk/. The search code itself is simple and self-explanatory. The more complex part is converting the data from its native home (a Symantec Q&A database running under MS-DOS) to MySQL tables served by Apache/GNU/Linux. This documentation covers that process.


Export the data from Q&A, and save to CD.

This section is specifically written for Henry Fogel's computer, and is probably not useful anywhere else.

Export from Q&A:

  1. Start up Q&A.

  2. Go to File->Utilities->Export Data->Standard Ascii and hit Enter.

  3. You will be prompted for filenames to export from and to. Fill them in, for example like this:

       QA Filename:  C:\QA4\INSTRUM.DTF
         Export to:  C:\QA4\INSTRUM.ASC
    

    When you hit Enter after the second filename, Q&A may pop up an alert box saying that there is already a file of that name, and asking if you want to overwrite it. Say "Yes", but make sure that you're overwriting a ".ASC" file, not a ".DTF" file.

    (For the first filename, you don't actually need to type ".DTF". You can just give the front part of the name and hit Enter, Q&A will fill in the rest. However, when typing the second name, it is important that you type the ".ASC".)

  4. Hit F10 twice.

  5. Now you will be asked if you want to change the default export settings. We need to change two of them: in the first line, say no quotes, and in the second line, use return (not comma) as the field delimiter. You can leave the third line, export field template, set to no.

  6. Hit F10 again.

  7. Lather, rinse repeat: The above steps should be run for each of

         INSTRUM.DTF
            ORCH.DTF
         RECORDS.DTF
         CHAMBER.DTF
            SONG.DTF
           OPREC.DTF
           OPERA.DTF
    
  8. Exit Q&A.

Save to CD:

  1. Put a blank CD-ROM into the top drive.

  2. When a new window pops up, choose "Open Writeable CD Folder" and click OK.

  3. When the writeable CD folder comes up, click "Folders" on the toolbar.

  4. Drag the entire QA4 folder from the left side into the big white area on the right (during this process, "CD Drive D:" should remain gray-highlighted on the left — the highlighted item on the left is the destination that the white area on the right corresponds to).

  5. On the toolbar, choose "File->Write these files to CD".

That's it. Now give the CD to Karl, who will do the steps below...


Publish the data.

These steps should be performed in the data/ subdirectory in the live web area on the server, after the .asc files have been copied there:

  1. Remove the ^Z from the bottom of each .asc file (just that one character — don't remove the newline preceding it).

  2. Convert the files from DOS-style CRLF to Unix LF line endings.

  3. Massage instrum.asc and orch.asc into the @instrum_abbrevs and @orch_abbrevs variables in asc-to-xml.pl. That's right, just put them in the source code, this ain't no Grand Hotel. Remember that the expansions need to be in double quotes, since some of the data may contain single quotes.

  4. Make sure that @instrum_abbrevs contains these two extra abbreviations:

      ["Orch", "Orchestra"],
      ["Ch Orch", "Chamber Orchestra"]
    

    (These mappings will probably not be present in instrum.asc. We use them in the online version to avoid exposing the public to our internal shorthands.)

  5. Search for empty strings ("") in orch.asc, as there are some problems in the data, and the correct entry immediately preceding or following a problem may be affected, depending on how the transformation macro was written. Watch out for multiple contradictory entries for "Ens" in @orch_abbrevs. Convert all "Orch" to "Orchestra", but do it carefully -- some already say "Orchestra" or "Orchestre". Same for "Symph", "Phil", and "Univ". Finally, search for periods, as these are usually abbreviations (though we leave "St." alone).

  6. Make sure the mysql user henryrec exists, and has full access to an existing database also named henryrec. Hint:

             $ mysql -u root -p
             Password: *******
             mysql> grant select on henryrec.* to henryrecro@localhost 
                      identified by 'RO_PASSWORD';
             mysql> grant all on henryrec.* to henryrecrw@localhost 
                      identified by 'RW_PASSWORD';
             mysql> ^D
             $ 
    
    (We keep the read-write user separate from the read-only user to avoid SQL injection attacks.)

  7. Set up a config file in the web server directory where search.py will run from (usually the parent of the data/ directory):

       [database]
       db_name = henryrec
       ro_user = henryrecro
       ro_pass = RO_PASSWORD
       rw_user = henryrecrw
       rw_pass = RW_PASSWORD
    

    Twak the web server configuration appropriately, making sure to block client access to the config file:

    <Directory /path/to/where/henrysrecords/is/set/up>
        Options +ExecCGI
        AllowOverride None
        AddHandler cgi-script py cgi pl
        <Files config>
          Deny From All
        </Files>
    </Directory>
    

    Don't forget to restart the server.

  8. Run 'make'. Then maybe go over the "CHECK" output on stderr to make sure there are no problematic titles. (Search for the word "gotcha" in asc-to-xml.pl for more explanation about that.) Running 'make' runs these three steps in order, but you could run them by hand, if you wanted to check the output at each stage:

       1. 'make xml'  -->  converts .asc files to .xml
       2. 'make sql'  -->  converts .xml files to .sql
       3. 'make db'   -->  loads the .sql into the database proper
    

    The Makefile also does a bit of renaming, because we don't publish one of the opera databases (it's idiosyncratic, incomplete, and mainly for Henry's private use) but we do publish the main opera database. See the Makefile for details.

    (Note that all data cleaning occurs in the first step. This is because we want the .xml files to represent exactly the same data as is in the MySQL database, because we encourage people to download the XML and use it as a local master. Thus, the `asc-to-xml.pl' script is much more complex than `xml_to_sql.py'.)

  9. Tweak the "last updated" date in index.html appropriately, and commit.

This completes the conversion. The database is now upgraded.