Python for Perl People (Ben's 3 bits) ====================== This is Ben's quickie 10-minute summary of Python, mostly shamelessly ripped and compressed from Guido's online tutorial (see python.org). This is mainly meant to get kfogel and cmpilato up and running in almost no time for the svn test suite, which is now in progress. Factoids -------- * unlike perl (but like scheme), python is an interactive interpreter if you run it directly -- your typical {read, eval, print} loop. Great for experimenting and learning without continually editing/saving/running a script. * interpreter is a calculator; just enter mathematical expressions. Data Types / Vars ----------------- * scalar variables need no '$' prefix: foo = 3 bar = "a string" * strings can be defined with either single or dbl quotes, and can nest: 'foo string' "bar string" 'the word "foo" is neat.' "the word 'baz' is cool." 'this is an \n \ escaped single line.' * triple quotes give you a
-like effect: """This is a preformatted string and line breaks are magically inferred.""" * string concatenation: the + sign is used to concatenate strings: * foo = 'str1' + 'str2' ==> foo == 'str1str2' This is true in the general case. For constant strings, plus sign isn't even necessary: foo = 'str1' 'str2' ==> foo == 'str1str2' * Multiply sign to repeat strings: foo = 'str1' * 3 ==> foo == 'str1str1str1' * string subscripts and Slice Notation: bar = "some string" ==> bar[3] =='e' bar[0:2] == 'som' bar[:2] == 'som' bar[6:] == 'tring' bar[-1] == 'g' # counting from the right bar[-2:] == 'ng' * ALERT: even though strings are readable like arrays, they're *immutable*. Don't try to change them; instead, make a new string: >>> bar = "some string" >>> bar[0] = 'e' (ERROR...) >>> bar = 'e' + bar[1:] >>> bar eome string * Lists: use square brackets. Can also be indexed/sliced/added. Note "slice notation". >>> foo = ['a', 'boo', 17, 88] >>> foo[2] 17 >>> foo[1:2] # slice notation ['boo', 17] >>> foo = foo + [21, 22] >>> foo ['a', 'boo', 17, 88, 21, 22] Lists are MUTABLE, unlike strings: >>> foo[1:2] = ['b', 'c'] # replace part of list >>> foo ['a', 'b', 'c', 88, 21, 22] >>> foo[0:2] = [] # delete part of list >>> foo ['c', 88, 21, 22] >>> foo[1:1] = ['mmm', 'good'] # insert into list >>> foo ['c', 'mmm', 'good', 88, 21, 22] >>> foo[1] = [1, 2, 3] # nest a list >>> foo ['c', [1, 2, 3], 'good', 88, 21, 22] >>> foo[1][2] # multidimensional arrays 3 * Tuples: use parens. Tuples are lists which are IMMUTABLE. Useful when you want *protected* list data. >>> w = 2, 8, "bloo" # "packing" args into a tuple >>> t = (2, 8, "bloo") # same semantic as above >>> e = () # create the empty tuple >>> a, b, c = w # "unpack" the tuple into multiple vars >>> l = ("bloo",) # trailing comma means create 1-element tuple >>> l ("bloo", ) # ugly syntax, but necessary. * Dictionaries: use curly braces. A dictionary is a mutable hash; the only restriction is that the key must be IMMUTABLE. (A tuple can be used, provided it contains no mutable objects inside it anywhere.) mydict = { key1 : val1, key2 : val2, key3 : val3 } >>> mydict = { 'a':'moo', 'q':'cow', 13:999} >>> mydict['a'] 'moo' >>> mydict['q'] = 'blah' * Misc goodies * multiple assignment is fine: a, b = 1, 2 * string or list length, use builtin "len": >>> len(bar) 11 * unicode strings start with u, and can contain embedded \\uXXXX chars: bar = u'Hello\\u0020World' * print command >>> print "hello" "hello" >>> word = "cow" >>> print "the secret word is " + word + " doncha know?" the secret word is cow doncha know? or instead of using + concatenation, use a comma, which adds whitespace between each item. >>> print "the secret word is", word, "doncha know?" the secret word is cow doncha know? The print command also can take C-like formatters, provided you use the % operator: >>> print "%2d is bigger than %3d" % (x, y) * The keyword 'None' is a generic placeholder of nothingness, much like 'nil' in scheme or NULL in C. You'll see it returned from functions occasionally. Formatting ---------- * all subordinate code blocks _must_ have indented bodies, and indented by the same amount. It's a way of enforcing syntax (instead of using curly braces.) DON'T freak about this! Emacs' python-mode does all the indenting for you automagically; there's no thought involved. * semicolons. There are NO semicolons at the ends of statements. Just like Bourne shell, they only exist to separate multiple commands on one line. Standard style is to never, ever use them. Flow Control ------------ Conditions are like C, but don't require parens, and end with colons. * if/elif/else if y > 3: do something elif b == 7: do something else else: do default thing * while while a != b: do something * for For loops happen over a *list* or *string*, like perl's "foreach" loop. foo = ['boo', 'bar', 'baz', 'bop', 'bing'] for word in foo: print word WARNING: don't change the sequence you're iterating over. Instead, make a *copy* of it using slice notation -- foo[:] is an implicit copy of foo -- and iterate over the copy. for word in foo[:]: # iterating over foo's elements, not foo itself modify foo somehow; # you can safely change foo now. * range This is totally new and neato; a way of magically generating a list of numbers on the fly. range(5) ==> [0, 1, 2, 3, 4] range(3, 7) ==> [3, 4, 5, 6] range(0, 10, 3) ==> [0, 3, 6, 9] Generate the list on the fly and use it to iterate: foo = ['boo', 'bar', 'baz', 'bop', 'bing'] for i in range(len(foo)): print i, foo[i] * break and continue commands: exactly like C, just what you think. * Loop... else: This is another new feature. You can optionally place an 'else:' clause directly after a loop. * It will run if the loop condition ever turns false * It will run if the loop exhausts the list its iterating over * It will *not* run if you 'break' out of the loop. while foo < 100: do blah else: print "foo hit 100" * pass New feature. 'pass' command does nothing. Since there are no curly braces to define code blocks, it's what you use to get the syntactic equivalent of {}. while 1: pass Functions --------- * Use 'def' to define a func. def foo(x, y, z): "Optional docstring" statement1 statement2 statement3 return x .... 'nuff said. * Scoping: no 'my' keyword is needed. Variables in a func are locally scoped, overriding any global variable names. Args are passed in by reference, as in perl. * "default" function args. neato. def foo(x, y=3, z="bar"): ... This function can now be called with either 1, 2, or 3 args. Any missing args will assume default values. (They get evaluated *only* the first time they're left out, not each time.) * Calling funcs flexibly. Instead of being forced to call foo(x, y, z), the args can be specified in *any* order, provided you treat them like "key = val": foo(y = 2, x = 9, z = 8) Or you can combine normal args with keyword args, provided the keyword args all come at the end. This is a complex topic, so I'll stop here. :) * Anonymous lambda functions Required to be a single expression. foofunc = lambda a, b: a + b def add_factory(n): return lambda x, incr=n: x + incr * Docstring conventions def foo(x): """A short one-line description of foo. A detailed, multi-line description that states that we will munge X into a new value. """ statement1 statement2 Sweet Builtin Methods --------------------- Everything is an object, even day-to-day, banal data types. Methods are accessed with a ".", just like in C/C++. * List methods. Assuming foo is a list, foo.append(x) add an item x to the end of foo foo.extend(L) add each item x in L to the end of foo foo.insert(i, x) insert item x at position i in foo foo.remove(x) remove the _first_ x found in foo foo.pop() remove and return last item in foo foo.pop(i) remove and return i-th item in foo foo.index(x) return offset of item x in foo foo.count(x) return number of occurrences of x in foo foo.sort() sort foo in place foo.reverse() reverse foo in place Hot Tip: a list can be used as a stack; just use append() and pop(). Hot Tip: a list can be used as a queue; just use append() and pop(0). * Dictionary methods. Assuming foo is a dictionary (hash), foo.keys() return list of all keys foo.values() return list of all values foo.has_key(x) is key x in the hash? returns 1 or 0. foo.items() return list of tuples in dictionary, e.g. [(foo, bar), (baz, bop), ...] If looping over a hash, it's best not to invoke foo.keys(), since it generates a new list and probably wastes memory. It's much more efficient and faster to do: for key in foo: value = foo[key] For really large hashes, take a look at iterkeys(), itervalues(), iteritems(). * del del foo[i] delete i-th item from foo (if a dictionary, remove the key 'i') del foo[i:j] delete slice from foo del foo delete entire variable * special test operations foo in bar test if foo is in list bar foo not in bar a is b test if two objects are the *same* object. a is not b Also: you can directly use < and > operators to compare lists; very interesting (but deterministic) behavior results... see the tutorial for more info, section 5.6. * map -- applies a func to a list (just like perl) map(func, list) ==> [list of all return values] * filter -- like map, but only returns *true* return values def f(x): return x % 2 != 0 filter(f, range[1,10]) ==> [2, 4, 6, 8] * List Comprehensions -- a way to really flexibly create lists! In essence, create a list by placing code into list brackets: [expression for x in y (if/for...) ] Examples: vec = [1, 2, 3] [3*x for x in vec] ==> [3, 6, 9] [3*x for x in vec if x > 1] ==> [6, 9] [x+y for x in vec for y in vec] ==> [2, 3, 4, 3, 4, 5, 4, 5, 6] Notice the implicit double loop in the last example. :) Modules & Packages ------------------ * The 3-bits: * place a bunch of functions in a file 'foomodule.py'. * load the file -- either into another script or into the interpreter -- with the command 'import foomodule' * call the routines like so: foomodule.func1() Ain't that simple? Great for debugging interfaces. ;) * Gravy Each module has its own symbol table. When you load a module, the funcs don't enter your own symbol table -- just the module itself. * option: import specific funcs *right* into your symbol table: from foomodule import func1, func2 * module search path: looks in $PYTHONPATH * byte-compiled modules: *.pyc, much like .elc files. * "Standard" modules: see Library Reference 'sys' module is the most important, built into the interpreter: perl's @ARGV ==> sys.argv list sys.ps1 is the interpreter's prompt string. sys.path starts as the list of dirs in $PYTHONPATH, and a script can append to it. * dir command: lists exported API of module >>> import foomodule >>> dir(foomodule) [list of public vars and funcs imported] >>> dir() [list of *all* available funcs from all loaded modules] >>> dir(__builtin__) [all symbols built into the interpreter, part of 'builtin' module] * Packages, or, How to Imitate Java In a nutshell: .py files can be organized in directory hierarchies so that each file takes on a module-name that with "dots", e.g. import MainPkg.subpkg.module MainPkg.subpkg.module.func1() or any variation of from MainPkg.subpkg.module import func1 func1() To make this happen, each subdir must have an index file describing available modules and symbols. Read all about it in tutorial sections 6.4 and 6.5. Formatting Tools ---------------- * ` ` : convert any value into a string. This is actually the repr() function, but backticks are a shortcut. * Methods from the "string" module import string string.rjust(str, x) right-justify str with x spaces string.zfill('12', 5) ==> '00012' pad left of string with 0's string.atoi convert int to string Note: it's generally better to use builtin string methods, rather than stuff from the string module. * File I/O >>> f = open('filename', 'w') modes are any of: 'r'ead, 'w'rite, 'a'ppend, 'r+' to read and write. if mode is not given, 'r' is default. >>> f.read(size) # read size bytes from descriptor >>> f.read() # read entire file into RAM >>> f.readline() # read one line of file >>> f.readlines() # return a list of all lines >>> f.write(string) # write string to file. (see ` ` operator above) >>> f.tell() # get file position >>> f.seek(offset) # set file position >>> f.close() # close descriptor >>> raw_input(str) # ask user "str" question, read back reply. * Pickle Module -- for persistent objects. REALLY USEFUL. Allows you to serialize any data object (list, dictonary, etc.) into a string! One 'pickles' and 'unpickles' data: >>> pickle.dump(x, f) # dump object x into file f >>> x = pickle.load(f) # read object x from file f Regular Expressions (read carefully!) ------------------- * import re * to do a regexp search: matchobj = re.search(pattern, string) * for more efficiency: re_object = re.compile(pattern) matchobj = re_object.search(string) matchobj2 = re_object.search(string2) ... Essentially, a regexp pattern is compiled into a little "machine", and then you can pass strings into that same machine over and over. * What's up with the "matchobj" thing? If a string contains the pattern, a MatchObject is returned. Else, the None object is returned. If you get a MatchObject, you can query the thing for $1, $2 groups and so on. See the "re" module documentation. Exceptions ---------- Just Like Java. while x < 10: try: do blah do blah except BlahError: print "caught blah error" except BlooError: print "caught bloo error" If the exception doesn't match, it's thrown upward to outer 'try' statements. Optional default handler -- except: print "caught unknown error" raise # re-raise the exception to my caller Optional 'else' clause can follow the exception handling too, in case you want code to execute if no exception is raised. See section 8 on how to define custom exceptions and cleanups, etc. Classes ------- There's a *real* class system goin' on in Python. You can define classes, instantiate them, subclass them, define constructors, get multiple inheritance, etc. But not now. Not here, not yet. :) Debugger -------- The interactive Python debugger is part of the standard dist, it's called pdb. See http://python.org/doc/current/lib/module-pdb.html. The niftiest thing about pdb is that you can enter the interactive debugger programmatically, even if you weren't running your program under the debugger from the start! Use the pdb.set_trace() method, like this: $ cat mydebug.py #!/usr/bin/env python import pdb def debug_at_7(i): if i == 7: print "About to enter interactive debugger:" pdb.set_trace() else: print "iteration %d" % i for i in range(0, 10): debug_at_7(i) $ Note that the stack frame you're in when you hit the debugger will be inside the debugger code, so you'll have to go up a couple of frames to inspect your own code: $ ./mydebug.py iteration 0 iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 iteration 6 About to enter interactive debugger: --Return-- > /usr/local/lib/python2.2/pdb.py(904)set_trace()->None -> Pdb().set_trace() (Pdb) where /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(13)?() -> debug_at_7(i) /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(8)debug_at_7() -> pdb.set_trace() > /usr/local/lib/python2.2/pdb.py(904)set_trace()->None -> Pdb().set_trace() (Pdb) up > /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(8)debug_at_7() -> pdb.set_trace() (Pdb) up > /home/kfogel/src/subversion/tools/cvs2svn/mydebug.py(13)?() -> debug_at_7(i) (Pdb) list 8 pdb.set_trace() 9 else: 10 print "iteration %d" % i 11 12 for i in range(0, 10): 13 -> debug_at_7(i) [EOF] (Pdb) print i 7 (Pdb) quit [...] $