Technical Notes for HenrysRecords.org |
The HenrysRecords.org code is under version control (check out or browse). If you'd like to suggest a patch, please contact us.
The search code itself is simple and self-explanatory. The more complex part is converting the data from its native home (a Symantec Q&A database running inside an MS-DOS emulator under a modern version of MS-Windows) to MySQL tables served by Apache/GNU/Linux. This documentation covers that process.
Save the QA4 folder to CD:
Put a blank CD-ROM into the top drive.
When a new window pops up, choose "Open Writeable CD Folder" and click OK.
When the writeable CD folder comes up, click "Folders" on the toolbar.
Make sure the "QA4" folder is visible on the left — you might have to click the "+" sign next to the "C:" folder to open it up enough to see the QA4 folder.
Drag the entire QA4 folder from the left side into the big white area on the right (during this process, "CD Drive D:" should remain gray-highlighted on the left — the highlighted item on the left is the destination that the white area on the right corresponds to).
On the toolbar, choose "File->Write these files to CD".
That's it. Now give the CD to Karl, who will do the steps below...
The commands in this section should be issued under DOS, or within dosemu under GNU/Linux. (We used to run these steps on Henry Fogel's computer, but later discovered that dosemu was good enough to run Q&A.)
Start up Q&A (under dosemu, dosemu QA1.EXE).
Go to File->Utilities->Export Data->Standard Ascii and hit Enter.
You will be prompted for filenames to export from and to. Fill them in, for example like this:
QA Filename: C:\QA4\INSTRUM.DTF Export to: C:\QA4\INSTRUM.ASC
When you hit Enter after the second filename, Q&A may pop up an alert box saying that there is already a file of that name, and asking if you want to overwrite it. Say "Yes", but make sure that you're overwriting a ".ASC" file, not a ".DTF" file.
(For the first filename, you don't actually need to type ".DTF". You can just give the front part of the name and hit Enter, Q&A will fill in the rest. However, when typing the second name, it is important that you type the ".ASC".)
Hit F10 twice.
Now you will be asked if you want to change the default export settings. We need to change two of them: in the first line, say no quotes, and in the second line, use return (not comma) as the field delimiter. You can leave the third line, export field template, set to no.
Hit F10 again.
Lather, rinse repeat: The above steps should be run for each of
INSTRUM.DTF ORCH.DTF RECORDS.DTF CHAMBER.DTF SONG.DTF OPREC.DTF OPERA.DTF
Exit Q&A.
These steps should be performed in the data/ subdirectory in the live web area on the server (which runs Debian GNU/Linux), after the .asc files have been copied there:
Run './transfer QA4_DIRECTORY'.
Run 'make sanitize'.
Massage orch.asc into the ../orch-abbrevs.txt file.
Massage instrum.asc into the @instrum_abbrevs variable in asc-to-xml.pl. That's right, just put it in the source code, this ain't no Grand Hotel. Remember that the expansions need to be in double quotes, since some of the data may contain single quotes.
Make sure @instrum_abbrevs contains this abbreviation:
["Perc", "Percussion"],
(But in general use svn diff to check that no abbreviations get dropped.
Make sure the mysql users henryrecro and henryrecrw exist, and have full access to an existing database named henryrec. Hint:
$ mysql -u root -p Password: ******* mysql> grant select on henryrec.* to henryrecro@localhost identified by 'RO_PASSWORD'; mysql> grant all on henryrec.* to henryrecrw@localhost identified by 'RW_PASSWORD'; mysql> ^D $(We keep the read-write user separate from the read-only user to avoid SQL injection attacks.)
Set up a config file in the web server directory where ../search.py will run from (usually the parent of the ../data directory):
[database] db_name = henryrec ro_user = henryrecro ro_pass = RO_PASSWORD rw_user = henryrecrw rw_pass = RW_PASSWORD
Tweak the web server configuration appropriately, making sure to block client access to the config file:
<Directory /path/to/where/henrysrecords/is/set/up> Options +ExecCGI AllowOverride None AddHandler cgi-script py cgi pl <Files config> Deny From All </Files> </Directory>
Don't forget to restart the server.
Make sure there's a cloned copy of the excellent "Awesomplete" Javascript auto-completion library in the top of the same web server directory:
$ git clone https://github.com/LeaVerou/awesomplete.git
(Really we just need the awesomplete.css and awesomplete.js files there, but right now the code expects to find them in that subdirectory because I developed using a cloned copy of the upstream source. So now we're running that way in production — that's right, not even using the minified versions. The completion lists themselves are so large they surely dwarf the code anyway.)
Run make data. This first removes the final ^Z (Ctrl-Z) character from all the .asc files and converts them from DOS to Unix-style line endings, then does these steps:
1. 'make xml' --> converts .asc files to .xml 2. 'make sql' --> converts .xml files to .sql (and builds the autocompletion list)
You could dry-run any of those last three steps by hand, if you wanted to check success at each stage. Typically 'make xml' has the most useful output, namely the "CHECK" lines on stderr that indicate problematic titles or soloists. Search for the word "gotcha" in asc-to-xml.pl for more explanation about that. Basically, asc-to-xml.pl is tentative about certain expansions of abbrevations in titles and soloists; you may have to add the title or soloist in question to the okay_titles or okay_soloists hash in asc-to-xml.pl, or, somewhat less often, modify gotcha_titles or gotcha_soloists.
All data cleaning occurs in the 'make xml' step. This is because we want the .xml files to represent exactly the same data as is in the MySQL database, since we encourage people to download the XML files if they need local data. Thus, the asc-to-xml.pl script is much more complex than xml_to_sql.py.
The Makefile also does a bit of renaming, because we don't publish one of the opera databases (it's idiosyncratic, incomplete, and mainly for Henry's private use) but we do publish the main opera database. See the Makefile for details.
The last step of data-loading should be run by hand. Do this on the live server, with the right passwords in place in the Makefile:
$ make db
Tweak the "last updated" date in index.html appropriately, and commit.
This completes the conversion. The database is now upgraded.
(Back to the search page.)