THE WISHLIST (TM) (or, Our Plans For The Recent Future). About the organization of this file: there are two considerations to take into effect when prioritizing a wishlist. The first is the usefulness of some proposal, and the second is the ease of implementing it. There is not necessarily any direct relationship between the two, but both affect the ordering of the list. As in, "If I can have better base colors in five minutes, that's worth more to me than having secondary-structure browsing in five years." In an ideal world, it would be possible to somehow multiply those two figures (usefulness vs ease-of-implementation) together and arrive at a single "priority index". Then we could just start implementing the items with the highest index first, and work our way down the list. However, it's not always clear exactly what the two values should be, so I have't organized the list quite like that. Instead, it is divided into several sections: KNOWN PRIORITIES: this is the top of the main stack. It lists, in order, the features we should be working on over the next few months. They are large features, not small, well-defined changes. TRIVIAL STUFF: these are features which would take little or no time to implement. It's okay to do one of them before a known priority, because so little time will be lost. Just play these by ear and implement one when your brain is telling you that today it wants a small, neat problem. They should all be done fairly soon (i.e.: March). SEMI-TRIVIAL STUFF: this is somewhat harder than the trivial stuff. A semi-trivial feature probably wants a day or two of solid work to get it up and running. Play these by ear too... but they should all be taken care of by the time the know priorities are completed. NON-TRIVIAL STUFF: Well. Here we are. These changes are just as major as the known priorities, only we didn't find them important enough to be put in that category. So they went here instead. I have a feeling we won't be touching this stuff for a while. THE DISTANT FUTURE: This is like above, only more so. :-) No, seriously, there's no time frame for these things. They're here only so we don't forget about them -- this section is just a place to record them. They'll all happen someday, no doubt, but not right now. UNORDERED: This is stuff I didn't have any ideas about how to order. Some of it might be important; I'd appreciate it if you guys could check it out and suggest revisions and/or good placements for the items here. We have also discussed the following things, although they don't appear in the list in any particular order because they're too general: - An interface and semantics for ordering, phylogenetic and otherwise. - External analysis programs in general - phylo interface (what exactly do we mean?) } are these two - ordering method } really one problem? - subalignment editing -*- -*- -*- -*- -*- -*- -*- KNOWN PRIORITIES -*- -*- -*- -*- -*- -*- -*- 0. Phylo browser features (Jim's current stack). * (Niels, Jim) Add the features to Phylo-browser/selector that Jim/I agreed on November 18 - main points in short: Reestablish menu-bar; handle all cases where there is not a 1:1 correspondence between sequences and taxa in phylo list; write visible part of phylo buffer to user-named file; import/export selections to user-named file; scrollbar; conserve some native Emacs functionality so edits are possible; incremental addition of sequences; more details. 1. Hook up Ross' searching code (Pace Lab wants this too). (yes; for example: Make a 'hard' selection, click an option which composes a motif (in ross-language), which is then fed his pattern matcher which returns the areas of match, see matches highlighted. This can be taken much further, e.g. secondary structure, his program is more powerful than regexps. /Niels) [ Pavan and I are working on this, I think. -Karl ] 2. Hook up some treeing programs (Pace Lab, priority 4) * Ability to call FastDNAml. Operation takes days sometimes; the editor shouldn't wait for the program to finish. It would be nice to be able to run FastDNAml via rsh. 3. Hook up automatic alignment programs (Clustal, Pal (?), etc). 4. Other file formats. Pace Lab mentions the following, in order of priority: a) Phylip --- very important b) Nexus c) GCG --- indexed formats, and multiple-sequence format. Jim sez: Perhaps the best thing to do would be to modify ReadSeq to support tabl format, and then have Ale use ReadSeq for all its conversions. Niels concurs that ReadSeq is a good idea. * (All) masks [ Exactly what is meant by masks is not yet well-defined; what sort of interface do we want? -Karl ] [ Ah -- it seems that masks are often handled directry by the receiving program, and all we have to do is pass it in correctly as a string along with the sequences in question. -Karl ] (This is a place where I think its important to make code general and ready for unexpected uses. Masks ought to appear as highlighted bars where characters can be edited underneath; the bars can extend vertically through all sequences or be restricted to chosen groups, but should default to the sequences they were generated from. They need to be generated automatically at least as a group operation, but manual addition/subtraction operations are needed too. This kind of display is much better than old-style extra mask-lines: They can be switched on/off without cluttering the alignment; can be applied to sequences elsewhere in the alignment (by changing a group) to e.g. examine if alignment is solid throughout; could be used to display data read from file which apply to a set of sequences; one doesnt have to look away from the sequence of interest to see what is included. Would be nice if masks be controlled from a frame similar to current group-frame, so that it is clear groups and masks can be combined freely so user can submit any intersection to any available relevant analysis. Making new masks by logical operations on old ones is relevant. (The column compositions could perhaps be pre-calculated, so masks and consensus sequences can be generated from them, rather than looking directly in the alignment). /Niels) [ Agree that we need masks; think maybe the old-style mask line is actually a win. It wouldn't require much extra support from us, and vertical-overlay-style masks would be difficult to implement, possibly slow down redisplay, and cause more visual confusion on an already confused display. Also, people are already used to dealing with old-style mask lines. -Karl ] 6. I think the way Ale reads and writes files needs some serious re-thinking. The current system is consistent in some ways which are useless, and inconsistent in other ways which would be useful. See my E-mail of Feb 6 1995 for the rationale for these suggestions. In this model, the alignment buffer corresponds to a single file; by default, all the sequences in the alignment buffer, and only the sequences in the alignment buffer, go into the file. GDBM files are treated specially. Open Read a file's contents into the alignment buffer, which must be empty. Mark the buffer to save to that file by default. Like Emacs's C-x C-f, Macintosh's "Open". Save Write the alignment buffer's contents back to its default file, replacing the file's contents entirely. Like Emacs's C-x C-s, Macintosh's "Save". Save As Solicit a filename from the user. If the file exists, verify that the user intends to overwrite it. Write the alignment buffer to that file, replacing the file's contents entirely. Make that file the buffer's default file. Like Emacs's C-x C-w, Macintosh's "Save As". Insert Solicit a filename from the user. Add the sequences in that file to the current alignment buffer. Leave the alignment buffer's default file unchanged (i.e. future "Saves" will put all the sequences in the same file). Like Emacs's C-x i; the Mac accomplishes this with copy and paste commands. Phylogenetic List Solicit a filename from the user. Display a phylogenetic listing of that file's contents. Let the user select sequences in the phylogenetic listing. Insert the selected sequences of the file to the current alignment buffer, as for "Insert". On the "group" menu, there should be two commands: Save Group To File Solicit a filename from the user. If the file exists, verify that the user intends to overwrite it. Write the sequences in the group to that file, replacing the file's contents entirely. Like Emacs's M-x write-region; the Mac accomplishes this with copy and paste commands. Merge Group With File Solicit a filename from the user. Write the sequences in the group to that file, preserving those sequences in the file which are not in the group. No analog in Emacs or the Mac. The above descriptions apply only to flat files. GDBM files should be handled differently. All saves to GDBM files preserve sequences in the file but not in the buffer. (What should happen when one cuts a sequence from a buffer visiting a GDBM file, and then saves?) -*- -*- -*- -*- -*- -*- -*- TRIVIAL STUFF -*- -*- -*- -*- -*- -*- -*- -*- * Would it be good to have a Help menu in the browser's menu bar which can bring up the phylo-relevant section of the info file? * It would be nice if the complement/reverse commands could be applied to groups as well as single sequences. * use rect-mark.el to get rectangle dragging w/ selections (thank you, Rick Sladkey). I have a copy in ~kfogel/elithp/. * color thoughts: - put group colors under user control (fore & back *separately*) - give them ability to color character backgrounds as well as foregrounds. - maybe ability to color the green ID bar. - put fore/background under user control generally. * From: Niels Larsen To: kfogel@floss.life.uiuc.edu Subject: for info ? Date: Wed, 15 Mar 1995 12:06:30 -0600 I did this for my own help button; I want to have a friendly Emacs environment, which I can then give away to friends in a friendly way. Maybe bad code, but maybe it could be used for our on-line help by changing Emacs to ALE ? (copy-face 'default 'info-xref) (set-face-background 'info-xref "khaki1") (set-face-foreground 'info-xref "black") (copy-face 'default 'info-node) (set-face-background 'info-node "SteelBlue4") (set-face-foreground 'info-node "white") (defun nl-help-frame-create () ;; (namestr) howto? (interactive) (let ((curbuf (current-buffer)) (nl-help-alist '((name . "Emacs On-Line Help") (left . 5) (top . 5) (height . 60) (width . 90) (font . "fixed") (foreground-color . "white") (background-color . "grey30") (internal-border-width . 4) (mouse-color . "khaki1") (cursor-color . "khaki1") (menu-bar-lines . 1) (visibility . t)))) (Info-goto-node "(emacs)Top") (new-frame nl-help-alist) (switch-to-buffer curbuf))) * (Niels) wants `unreadable' font available, for overview. * lock/unlock all sequences * go to organism regexp --> just use ID Find-all, then have group movement commands. [ Ah, umm. Okay; must think carefully about keybindings for the group movement commands, though. -Karl ] * check out DCSE 3.0 (just to steal features, maybe) * (Niels) We should ask for Jim's little program that compares sequences before and after exit. Would be good to guarantee reliability at the sequence level. [ Yeah -- I think this is being documented and is ready to use now. -Karl ] [ This has been documented, but it isn't called automatically. It could be most happily checked against the original file after we write out the tabl-format data, but before we convert it to the destination format. - JimB ] * (Niels) 0.1/0.2 documentation: List of what hardware/software is required; Intro (why Emacs, overall status); exact stepwise installation instructions; tutorial for computer dummies that quickly walks through best features (I know the dummy language and could write it); known problems section; key rebinding section. I find key binding important because each extra keystroke saved increases editing speed; we can benefit by receiving maps from different machines, keyboards or terminal emulators, include them in .gyppu and redistribute). -*- -*- -*- -*- -*- -*- SEMI-TRIVIAL STUFF -*- -*- -*- -*- -*- -*- -*- -*- * (Will Fischer) Guess a reasonable default color map by looking at the sequences to see if they're amino or nucleic. * (Karl, Niels) named placeholders (bookmarks, essentially) * (Karl, Niels) Double click in ID buffer to add/delete seq from group. Single click should just move point. * (Niels) Include number of seqs in each group in the group frame entry. * (Niels) Printing. Look into psprint.el package. Ask Terry Gaasterland if we can have Belmont code for PostScript 'pretty-prints'. -*- -*- -*- -*- -*- -*- UNDENIABLY NON-TRIVIAL STUFF -*- -*- -*- -*- -*- -*- Wow, this section has been cleaned out. It used to have two things in it, but they're done (or at least moved somewhere else). -*- -*- -*- -*- -*- -*- -*- THE DISTANT FUTURE -*- -*- -*- -*- -*- -*- -*- * consensus calculation. (re: conversation w/ Terry Marsh) * generating selections base on composition; in general, mechanical ways of generating selections "rectangles are nice, but with 3000 sequences..." * (Terry Gaasterland) be able to *replace* files. IOW: be able to take files/sequences out, load new ones in, have clean interface to doing so. Just like C-x C-v. [ Need to discuss w/ Jim about easiest way to do this. Mainly interface questions; right now we don't keep a record of all the files in the editor, although that information could be deduced from examining the buffer. We want to offer a menu of files to be taken out: any sequences or sequence fragments in those files would then be removed from the editor. Then there can be a `replace' command which simply calls the previously-described, then `open'. -Karl ] * (Carl?) diff two sequences? (Or is that a job for a subprocess? Is it useful before they've been aligned? What exactly does it mean?) (again, please use highlight, not an ae2-style extra line. /Niels) [ Again, we'll have to decide whether it's worth the extra work. :-) -Karl ] * (Niels) Helix-handling. Pair checker. From: DAMBERGER To: kfogel@cyclic.com Subject: Re: any luck? Date: Tue, 21 Mar 1995 08:11:57 -0700 (MST) > > >I'd almost be ready to use it if one could add helix information. > > Have you any thoughts about how the information would be presented? > Hmmm...In ae2 we use reverse video to highlight helices. Maybe the helices could be highlighted that way? Probably, one would also want to be able to turn this option on and off. Also, a little window to tell where the basepair partner is might be useful. The problem with using colors to indicate helices, is that selections are allready colored. Simon * (Niels) Consistent search scopes. There seems to be three possible search ranges (current sequence, current group, whole buffer), two directions (forward, reverse), three buffers (ids, alignment, annotation; be prepared for more), and number of matches (next, all). These things are logically separate, and almost all combinations seems relevant. This is another case of where generality can have unexpected payoffs; for example searching for a probe/primer target in the whole alignment and then cutting out the group of matches, would give exactly the kind of overview we need. Combination of search range, direction, buffer, and match range should therefore be unrestricted and controlled by modifier keys (search will be a frequent operation). We could list every option in a search menu, but it would be cluttered. Or make it consistent with edits: If in selected group, range is that group (and menu could list group options only), otherwise current sequence. Meta or Shift key could 'amplify' the search range as with movements. I agree its okay to make a group out of the sequence(s) that match (is it easy to put the number of members in the groups frame, next to name? its helpful to see number of matches). Current 'Go to organism' would fall under this. Please dont improve the regexps, Ross's pattern matcher has potential for powerful motif searches. (btw, where a given search string matches one could give each character in the match range the property of the search string itself, and that way have a quick way of coming back to the matches later). [ Agree -- this seems like a very sensible way to think about searches. I will work on a consistent interface. -Karl ] * (Niels) Window shrinking/growing commands. [ This refers to the split-alignment stuff. -Karl } * version control for sequences (integrated with some database, no doubt). (yes, wait a while /Niels) * (Niels) A general 'repeat last command' function. [ I have put this as low priority because it's not as simple as I had hoped. The problem is that many commands take arguments/inputs of some sort, and we'd have to record those too. Will think some more. -Karl ] * Better annotation menus, better annotation mode in general. [ Yes; what is useful in annotating? -Karl ] * (Niels) Specify some programmer's guidelines, so its easier for others to help develop. * (Bonnie, ... everyone, really) mirrored helix movements (which highlight the helix-wise corresponding base to the current one). This probably wants the vertical overlays we were talking about; it may be far off in the future. (Yea, too early for this, need a maintainable secondary structure description. I may be able to interest some Michigan people to help with this part, including the alignment procedure. /Niels). * vertical overlays (?) In exactly what areas would they be helpful? [ Difficult to implement, but useful when done. -Karl ] -*- -*- -*- -*- -*- -*- -*- UNORDERED -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- Jim, which of the following things are already done by your analysis code? * (Pace Lab, priority 2) Similarity tables --- how different is each sequence from each other sequence? a) Over all columns in the sequence. b) Specify columns with masks. c) Specify columns with selections. * (Pace Lab, priority 3) Compute consensus line of a group of sequences, or of entire alignment. * (Pace Lab, priority 5) Amino <-> Nucleotide conversion The code to use should be a parameter. When doing the Amino->Nucleotide conversion, how to deal with multiple possible encodings? a) generate ambiguity codes b) generate "most probable" c) generate "least ambiguous" (most universal across organisms?) [ This is mostly done, though ambiguity characters could be handled better. -Karl ] * (Karl) use real dialog boxes? (Whats currently possible? /Niels) * (Niels in a mail) It would be good with a fast C function that returns the residue number of a given character, like the one cursor is at. I dont think ae2-style edge-numbering is needed. I would like a special frame that lists 1) sequence number of cursor residue, 2) column number of cursor character, 3) reference sequence id, 4) sequence number of the residue in reference sequence, that is at the same column. With number frame shown, numbers should update when cursor moves (like up/down moves of annotation). This will also go in the wish list, just havent done it yet. * From: Jim Blandy To: Karl Fogel , Niels Larsen Subject: interesting fact Date: Wed, 15 Feb 1995 23:01:26 -0500 GenBank is available in a machine-readable format --- that is, all the fields carefully parsed out in a way that is easy for machines to grok. It uses a generic standard print syntax called ASN.1. Maybe this would be helpful when the RDP converts to a full database. * From: Jim Blandy To: kfogel@cyclic.com Subject: fixed Date: Sat, 18 Feb 1995 16:16:01 -0500 I'll bet Richard wouldn't mind at all if we made x-pointer-shape a frame parameter. That would fix this problem (frame parameters aren't inherited) and relieve us from the ugly mouse color kludge.