Python for Perl People (Ben's 3 bits) ====================== This is Ben's quickie 10-minute summary of Python, mostly shamelessly ripped and compressed from Guido's online tutorial (see python.org). This is mainly meant to get kfogel and cmpilato up and running in almost no time for the svn test suite, which is now in progress. Factoids -------- * unlike perl (but like scheme), python is an interactive interpreter if you run it directly -- your typical {read, eval, print} loop. Great for experimenting and learning without continually editing/saving/running a script. * interpreter is a calculator; just enter mathematical expressions. Data Types / Vars ----------------- * scalar variables need no '$' prefix: foo = 3 bar = "a string" * strings can be defined with either single or dbl quotes, and can nest: 'foo string' "bar string" 'the word "foo" is neat.' "the word 'baz' is cool." 'this is an \n \ escaped single line.' * triple quotes give you a
-like effect:
      """This is a preformatted
         string and line breaks
         are magically inferred."""

* string concatenation:  the + sign is used to concatenate strings:

  *  foo = 'str1' + 'str2' ==> 
      foo == 'str1str2'

      This is true in the general case.

      For constant strings, plus sign isn't even necessary:
         foo = 'str1' 'str2' ==>
          foo == 'str1str2'

  *  Multiply sign to repeat strings:
     foo = 'str1' * 3 ==>
      foo == 'str1str1str1'

* string subscripts and Slice Notation:  

     bar = "some string" ==>

         bar[3] =='e'
         bar[0:2] == 'som'
         bar[:2]  == 'som'
         bar[6:]  == 'tring'
         bar[-1]  == 'g'       # counting from the right
         bar[-2:] == 'ng'

* ALERT:  even though strings are readable like arrays, they're
  *immutable*.  Don't try to change them; instead, make a new string:

     >>> bar = "some string"
     >>> bar[0] = 'e'         
      (ERROR...)
     >>> bar = 'e' + bar[1:]
     >>> bar
     eome string


* Lists:  use square brackets.  Can also be indexed/sliced/added.

  Note "slice notation".

    >>> foo = ['a', 'boo', 17, 88]
    >>> foo[2]
    17
    >>> foo[1:2]                 # slice notation
    ['boo', 17]
    >>> foo = foo + [21, 22]
    >>> foo
    ['a', 'boo', 17, 88, 21, 22]

  Lists are MUTABLE, unlike strings:

    >>> foo[1:2] = ['b', 'c']         # replace part of list
    >>> foo
    ['a', 'b', 'c', 88, 21, 22]
    >>> foo[0:2] = []                 # delete part of list
    >>> foo
    ['c', 88, 21, 22]
    >>> foo[1:1] = ['mmm', 'good']    # insert into list
    >>> foo
    ['c', 'mmm', 'good', 88, 21, 22]
    >>> foo[1] = [1, 2, 3]            # nest a list
    >>> foo
    ['c', [1, 2, 3], 'good', 88, 21, 22]
    >>> foo[1][2]                     # multidimensional arrays
    3


* Tuples:  use parens.

  Tuples are lists which are IMMUTABLE.  Useful when you want
  *protected* list data.

    >>> w = 2, 8, "bloo"       # "packing" args into a tuple
    >>> t = (2, 8, "bloo")     # same semantic as above
    >>> e = ()                 # create the empty tuple
    >>> a, b, c = w            # "unpack" the tuple into multiple vars
    >>> l = ("bloo",)          # trailing comma means create 1-element tuple
    >>> l
    ("bloo", )                 # ugly syntax, but necessary.


* Dictionaries:  use curly braces.

  A dictionary is a mutable hash; the only restriction is that the key
  must be IMMUTABLE.  (A tuple can be used, provided it contains no
  mutable objects inside it anywhere.)

    mydict = { key1 : val1, 
               key2 : val2,
               key3 : val3 }

    >>> mydict = { 'a':'moo', 'q':'cow', 13:999}
    >>> mydict['a'] 
    'moo'
    >>> mydict['q'] = 'blah'
    
* Misc goodies

  * multiple assignment is fine:
      a, b = 1, 2

  * string or list length, use builtin "len":
       >>> len(bar)
       11

  * unicode strings start with u, and can contain embedded \\uXXXX chars:
       bar = u'Hello\\u0020World'

  * print command

      >>> print "hello"
      "hello"
      >>> word = "cow"
      >>> print "the secret word is " + word + " doncha know?"
      the secret word is cow doncha know?

      or instead of using + concatenation, use a comma, which adds
      whitespace between each item.

      >>> print "the secret word is", word, "doncha know?"
      the secret word is cow doncha know?

      The print command also can take C-like formatters, provided you
      use the % operator:

      >>> print "%2d is bigger than %3d" % (x, y)

  * The keyword 'None' is a generic placeholder of nothingness, much like
    'nil' in scheme or NULL in C.  You'll see it returned from
    functions occasionally.


Formatting
----------

  * all subordinate code blocks _must_ have indented bodies, and
    indented by the same amount.  It's a way of enforcing syntax
    (instead of using curly braces.)

    DON'T freak about this!  Emacs' python-mode does all the indenting
    for you automagically; there's no thought involved.

  * semicolons.  There are NO semicolons at the ends of statements.
    Just like Bourne shell, they only exist to separate multiple
    commands on one line.  Standard style is to never, ever use them.



Flow Control
------------

Conditions are like C, but don't require parens, and end with colons.

* if/elif/else

  if y > 3:
       do something
  elif b == 7:
       do something else
  else:
       do default thing

* while

   while a != b:
       do something

* for 

  For loops happen over a *list* or *string*, like perl's "foreach" loop.

  foo = ['boo', 'bar', 'baz', 'bop', 'bing']
  for word in foo:
      print word

  WARNING:  don't change the sequence you're iterating over. 
  Instead, make a *copy* of it using slice notation --  foo[:] is an
  implicit copy of foo -- and iterate over the copy.

  for word in foo[:]:       # iterating over foo's elements, not foo itself
      modify foo somehow;   # you can safely change foo now.
  
* range  

  This is totally new and neato; a way of magically generating a list
  of numbers on the fly.

     range(5)        ==>  [0, 1, 2, 3, 4]
     range(3, 7)     ==>  [3, 4, 5, 6]
     range(0, 10, 3) ==>  [0, 3, 6, 9] 

  Generate the list on the fly and use it to iterate:

  foo = ['boo', 'bar', 'baz', 'bop', 'bing']
  for i in range(len(foo)):
      print i, foo[i]
 
* break and continue commands:  exactly like C, just what you think.

* Loop... else:

  This is another new feature.  You can optionally place an 'else:'
  clause directly after a loop.

    * It will run if the loop condition ever turns false
    * It will run if the loop exhausts the list its iterating over
    * It will *not* run if you 'break' out of the loop.

  while foo < 100:
      do blah
  else:
      print "foo hit 100"

* pass

   New feature.  'pass' command does nothing.  Since there are no
   curly braces to define code blocks, it's what you use to get the
   syntactic equivalent of {}.

   while 1:
       pass


Functions
---------

* Use 'def' to define a func.

    def foo(x, y, z):
        "Optional docstring"
        statement1
        statement2
        statement3
        return x

    .... 'nuff said.

* Scoping:  no 'my' keyword is needed.  Variables in a func are
  locally scoped, overriding any global variable names.

  Args are passed in by reference, as in perl.

* "default" function args.  neato.

     def foo(x, y=3, z="bar"):
         ...

   This function can now be called with either 1, 2, or 3 args.  Any
   missing args will assume default values.  (They get evaluated
   *only* the first time they're left out, not each time.)

* Calling funcs flexibly.

  Instead of being forced to call foo(x, y, z), the args can be
  specified in *any* order, provided you treat them like "key = val":

     foo(y = 2, x = 9, z = 8)

  Or you can combine normal args with keyword args, provided the
  keyword args all come at the end.  This is a complex topic, so I'll
  stop here.  :)

* Anonymous lambda functions

  Required to be a single expression.

     foofunc = lambda a, b: a + b

     def add_factory(n):
         return lambda x, incr=n: x + incr

* Docstring conventions

     def foo(x):
         """A short one-line description of foo.

         A detailed, multi-line description that
         states that we will munge X into a new value.
         """
         statement1
         statement2



Sweet Builtin Methods
---------------------

Everything is an object, even day-to-day, banal data types.  Methods
are accessed with a ".", just like in C/C++.  

* List methods.  Assuming foo is a list,

     foo.append(x)                add an item x to the end of foo
     foo.extend(L)                add each item x in L to the end of foo
     foo.insert(i, x)             insert item x at position i in foo
     foo.remove(x)                remove the _first_ x found in foo
     foo.pop()                    remove and return last item in foo
     foo.pop(i)                   remove and return i-th item in foo
     foo.index(x)                 return offset of item x in foo
     foo.count(x)                 return number of occurrences of x in foo
     foo.sort()                   sort foo in place
     foo.reverse()                reverse foo in place
     
  Hot Tip:  a list can be used as a stack;  just use append() and pop().
  Hot Tip:  a list can be used as a queue;  just use append() and pop(0).

* Dictionary methods.  Assuming foo is a dictionary (hash),

     foo.keys()                   return list of all keys
     foo.values()                 return list of all values
     foo.has_key(x)               is key x in the hash?  returns 1 or 0.
     foo.items()                  return list of tuples in dictionary,
                                  e.g. [(foo, bar), (baz, bop), ...]

     If looping over a hash, it's best not to invoke foo.keys(), since
     it generates a new list and probably wastes memory.  It's much
     more efficient and faster to do:
         
          for key in foo:
             value = foo[key]

     For really large hashes, take a look at iterkeys(), itervalues(),
     iteritems().


* del

     del foo[i]                   delete i-th item from foo
                                  (if a dictionary, remove the key 'i')
     del foo[i:j]                 delete slice from foo
     del foo                      delete entire variable
     
* special test operations

     foo in bar                   test if foo is in list bar
     foo not in bar
  
     a is b                       test if two objects are the *same* object.
     a is not b

  Also:  you can directly use < and > operators to compare lists;
  very interesting (but deterministic) behavior results... see the
  tutorial for more info, section 5.6.

* map -- applies a func to a list  (just like perl)

    map(func, list)  ==>  [list of all return values]
    
* filter -- like map, but only returns *true* return values

    def f(x):  return x % 2 != 0

    filter(f, range[1,10])  ==>   [2, 4, 6, 8]

* List Comprehensions -- a way to really flexibly create lists!

  In essence, create a list by placing code into list brackets:

     [expression for x in y (if/for...) ]

  Examples:

     vec = [1, 2, 3]

     [3*x for x in vec]                 ==>  [3, 6, 9]
     [3*x for x in vec if x > 1]        ==>  [6, 9]
     [x+y for x in vec for y in vec]    ==>  [2, 3, 4, 3, 4, 5, 4, 5, 6]   

  Notice the implicit double loop in the last example. :)



Modules & Packages
------------------

* The 3-bits:  

  * place a bunch of functions in a file 'foomodule.py'.  

  * load the file -- either into another script or into the
    interpreter -- with the command 'import foomodule'

  * call the routines like so:  foomodule.func1()

Ain't that simple?  Great for debugging interfaces.  ;)


* Gravy
 
  Each module has its own symbol table.  When you load a module, the
  funcs don't enter your own symbol table -- just the module itself.

  * option: import specific funcs *right* into your symbol table: 
           from foomodule import func1, func2

  * module search path:  looks in $PYTHONPATH

  * byte-compiled modules:  *.pyc, much like .elc files.

  * "Standard" modules:  see Library Reference

     'sys' module is the most important, built into the interpreter:

        perl's @ARGV ==> sys.argv list

        sys.ps1 is the interpreter's prompt string.

        sys.path starts as the list of dirs in $PYTHONPATH, and a script
        can append to it.

  * dir command:  lists exported API of module

      >>> import foomodule
      >>> dir(foomodule)
      [list of public vars and funcs imported]
      >>> dir()
      [list of *all* available funcs from all loaded modules]
      >>> dir(__builtin__)
      [all symbols built into the interpreter, part of 'builtin' module]


* Packages, or, How to Imitate Java

  In a nutshell:  .py files can be organized in directory hierarchies
  so that each file takes on a module-name that with "dots", e.g.

     import MainPkg.subpkg.module
     MainPkg.subpkg.module.func1()

        or any variation of

     from MainPkg.subpkg.module import func1
     func1()

  To make this happen, each subdir must have an index file describing
  available modules and symbols.  Read all about it in tutorial
  sections 6.4 and 6.5.


Formatting Tools
----------------

* ` `  : convert any value into a string. 

   This is actually the repr() function, but backticks are a shortcut.

* Methods from the "string" module

   import string
   string.rjust(str, x)                   right-justify str with x spaces
   string.zfill('12', 5) ==> '00012'      pad left of string with 0's
   string.atoi                            convert int to string

   Note:  it's generally better to use builtin string methods, rather
   than stuff from the string module.


* File I/O

   >>> f = open('filename', 'w')

       modes are any of:  'r'ead, 'w'rite, 'a'ppend, 'r+' to read and write.
       if mode is not given, 'r' is default.

   >>> f.read(size)      # read size bytes from descriptor
   >>> f.read()          # read entire file into RAM
   >>> f.readline()      # read one line of file
   >>> f.readlines()     # return a list of all lines
   >>> f.write(string)   # write string to file. (see ` ` operator above)
   >>> f.tell()          # get file position
   >>> f.seek(offset)    # set file position
   >>> f.close()         # close descriptor

   >>> raw_input(str)    # ask user "str" question, read back reply.


* Pickle Module -- for persistent objects.

  REALLY USEFUL.  Allows you to serialize any data object (list,
  dictonary, etc.) into a string!  One 'pickles' and 'unpickles' data:

   >>> pickle.dump(x, f)   # dump object x into file f
   >>> x = pickle.load(f)  # read object x from file f


Regular Expressions (read carefully!)
-------------------

* import re

* to do a regexp search:  matchobj  = re.search(pattern, string)
              
* for more efficiency:    re_object = re.compile(pattern)
                          matchobj  = re_object.search(string)
                          matchobj2 = re_object.search(string2)
                          ...

  Essentially, a regexp pattern is compiled into a little "machine",
  and then you can pass strings into that same machine over and over.

* What's up with the "matchobj" thing?

  If a string contains the pattern, a MatchObject is returned.  Else,
  the None object is returned.

  If you get a MatchObject, you can query the thing for $1, $2 groups
  and so on.  See the "re" module documentation.
     


Exceptions
----------

Just Like Java.

  while x < 10:
           try:
               do blah
               do blah
           except BlahError:
               print "caught blah error"
           except BlooError:
               print "caught bloo error"

If the exception doesn't match, it's thrown upward to outer 'try'
statements.

Optional default handler --

           except:
               print "caught unknown error"
               raise       # re-raise the exception to my caller

Optional 'else' clause can follow the exception handling too, in case
you want code to execute if no exception is raised.

See section 8 on how to define custom exceptions and cleanups, etc.


Classes
-------

There's a *real* class system goin' on in Python.  You can define
classes, instantiate them, subclass them, define constructors, get
multiple inheritance, etc.

But not now.  Not here, not yet.  :)


Debugger
--------

The interactive Python debugger is part of the standard dist, it's
called pdb.  See http://python.org/doc/current/lib/module-pdb.html.

The niftiest thing about pdb is that you can enter the interactive
debugger programmatically, even if you weren't running your program
under the debugger from the start!  Use the pdb.set_trace() method,
like this:

   $ cat mydebug.py
   #!/usr/bin/env python
   
   import pdb
   
   def debug_at_7(i):
     if i == 7:
       print "About to enter interactive debugger:"
       pdb.set_trace()
     else:
       print "iteration %d" % i
      
   for i in range(0, 10):
     debug_at_7(i)
   $ 

Note that the stack frame you're in when you hit the debugger will be
inside the debugger code, so you'll have to go up a couple of frames
to inspect your own code:

   $ ./mydebug.py
   iteration 0
   iteration 1
   iteration 2
   iteration 3
   iteration 4
   iteration 5
   iteration 6
   About to enter interactive debugger:
   --Return--
   > /usr/local/lib/python2.2/pdb.py(904)set_trace()->None
   -> Pdb().set_trace()
   (Pdb) where
     /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(13)?()
   -> debug_at_7(i)
     /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(8)debug_at_7()
   -> pdb.set_trace()
   > /usr/local/lib/python2.2/pdb.py(904)set_trace()->None
   -> Pdb().set_trace()
   (Pdb) up
   > /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(8)debug_at_7()
   -> pdb.set_trace()
   (Pdb) up
   > /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(13)?()
   -> debug_at_7(i)
   (Pdb) list
     8  	    pdb.set_trace()
     9  	  else:
    10  	    print "iteration %d" % i
    11  	   
    12  	for i in range(0, 10):
    13  ->	  debug_at_7(i)
   [EOF]
   (Pdb) print i
   7
   (Pdb) quit
   [...]
   $