Previous lesson: Flow control
This time we're going through sequence data types, and all the kickass things you can do with them. Actually, you've kind of already seen a sequence type: strings. Strings are a sequence of characters. Technically, in Python, strings are a special case and different from other sequence data types, but you can still do a lot of sequencey things to them. Once you've seen a few, I'll introduce the
tuple data type.
The most basic feature of sequences is the ability to access a specific item inside them. This is done like this:
>>> "Hello" 'l'
A critical property of indexes is that they start at 0.
'e'. This is actually pretty common in computing.
You can also, of course, index with a variable.
>>> i = 1 >>> "Hello"[i] 'e'
Exercise: make a program that asks the user for a string, and then a number, and prints out the character at that position in the string. (For extra fun, do it in one line.)
print(input("enter a string:")[int(input("enter a number:"))])
This might look hard to parse - indeed, it's bad enough that a serious programmer might do it on multiple lines just for readability's sake - so I'll dissect it for ya. Assuming I enter
3, the code can be parsed like this:
input("enter a string:")is replaced with the string entered, leaving:
print('blah'[int(input("enter a number:"))])
input("enter a number:")is replaced with the next string entered, leaving:
int('3')is replaced with
You might've already thought to try this and figured out how it works, but what happens if you run
Negative indices start from the end. Note that this means they are not subject to zero-indexing, since -0 is the same as 0. 0 is the first element, 1 is the second element, -1 is the last element, -2 is the second-last. If you find this confusing, you're not alone :)
If you try to access an element that doesn't exist, such as
'blah', you'll get an error. This is a good time to introduce the handy
len function that can help you avoid this:
>>> len("hi") 2
Exercise: modify the string indexing program so that it won't crash if the user asks for an invalid index, but print a message instead.
Another super cool feature of sequences is slicing: the ability to index a range of elements at a time.
>>> 'pizza'[1:3] 'iz'
It gives us a string that starts at position 1 and ends at position 3, giving us characters #1 and #2. (You can think of this like a slice is always from and including the start position and up to but not including the end position.)
If you omit one or both numbers of the slice, it goes to the beginning or end:
>>> 'pizza'[2:] 'zza' >>> 'pizza'[:2] 'pi'
'pizza'. An omitted start position is the same as
0, but an omitted end position is not the same as
-1 is the last item, so slicing up to
-1 cuts it out.
An invalid slice range, where the start position is greater than the end position (like
[5:1]), will just quietly return an empty string.
Exercise: Try making a program that accepts a string and then prints it back limited to a specified max length. If the string has to be cut off, it should end with
... to indicate there's more - but the total printed length should be exactly the maximum, including the ellipsis. For example, if you choose 10 as the max length and I enter
Hi my friend, it should print
Hi my f....
limit = 10 msg = input("Type a string to truncate to " + str(limit) + " characters: ") if len(msg) > limit: msg = msg[:limit-3] + "..." print(msg)
Okay, this is a rather obscure feature, but I might as well demonstrate it while I'm talking about this. You can have a third number inside the slice brackets, which specifies the "step" size:
>>> 'abcabcabcabcabc'[::3] 'aaaaa'
This slices from the beginning (because start position is omitted) to the end (because end position is omitted), selecting only every third character. You can think of the step size as defaulting to 1.
Jargon: iterate: to loop with a sequence and do something with each element inside it. It can be used with either "on" or "over" as a preposition.
Exercise: use your knowledge of loops and indexing to write a program that gets a string from the user and then prints out each character inside it on its own line. (I'm about to introduce an easier way of doing this, but I want you to see how it can be done without it.)
string = input("give me a string:") index = 0 while index < len(string): print(string[index]) index += 1
Note that I couldn't put the
input("give me a string:") that defines the variable
string inside the condition like
while index < len(input("give me a string:")):. If you tried to solve this yourself before looking at the solution (which you should have), you probably ran into this, but the reason that doesn't work is that a
while loop's condition is evaluated every time it's checked, since it has to know when to stop. So every time it loops, it would ask, is
index < len(input("give me a string:"))?. And every time it asks that, it would execute
input("give me string:") to find out what its value was, which means the user would be asked to enter a new string after every iteration of the loop. The solution was to execute
input("give me a string:") once at the beginning, and store the value, so that when the
while loop evaluates its condition every time, it's only asking whether
index is less than the length of
string is the result of
input("give me a string:"). This way, it doesn't ask the user for a new string every time.
One of the most important keyword related to sequences:
for is an alternate loop construction that makes iterating on a sequence much easier:
for letter in input("enter a word:"): print(letter)
Note that with
for, the expression that tells it the sequence to be iterated (in this case, the result of
input("enter a word:")) is only evaluated once, and then it just internally runs the loop with
letter set to each character in that string. So with
for it's safe to put the
input in the
Another problem you can solve now: make a program that gets a string from the user, and then a letter, and determines whether the letter is in the string.
string = input("give me a string:") char_to_find = input("give me a single character:") found = False for char in string: if char == char_to_find: found = True if found: print(char, 'is in', string) else: print(char, 'is not in', string)
Yes, the problem I just made you solve was another unnecessary one :P You can use
in outside of the context of
for to test whether something is inside a sequence:
>>> 'e' in 'Hello' True >>> 'x' in 'Hello' False
Well isn't that neat! I just wanted you to solve this problem the hard way as an intellectual exercise, and because many other langugaes don't have this keyword or anything equivalent to it. (C doesn't; Go only has it for strings, but not for other sequence types.)
Additionally, on strings,
in works with multi-character substrings. Check this out:
>>> 'He' in 'Hello' True >>> 'eH' in 'Hello' False
Testing whether a multi-character string is inside of another string manually is a nightmare compared to this. (If you want, take a stab at it.)
not in works the way you expect, even though, technically, you should expect it to be
not (x in y) (which does also work). After all,
x not > y is a syntax error. Basically, it's like
not in is an operator in its own right.
Now that we're iterating on stuff, it's a very good time to introduce two handy keywords used in loops: the
break statement, which exits the loop immediately even if its condition is still true, and
continue, which skips the rest of the current iteration, and continues from the top of the loop. Here's a demo of both:
number = 0 while number < 10: number += 1 if number == 5: # skip 5 for no reason continue print("the next number is", number) if input("want to see another? (y/n)") == 'n': break
Tuples are a more general sequence data type. They store an arbitrary list of arbitrary values. The syntax for tuple literals is to enclose them in brackets and separate elements by commas:
>>> nums = (6, 1, 4) >>> nums (6, 1, 4) >>> nums 6 >>> for num in nums: print(num) 6 1 4 >>> greetings = ("Hi", "Hello", "Good day", "Salutations") >>> greetings "Good day" >>> for greeting in greetings: print(greeting) Hi Hello Good Day Saluations >>> print(greetings[:2]) ('Hi', 'Hello')
As you can see, tuples are subject to indexing, slicing, and the rest of the bag the same way strings are, but they aren't limited to holding strings; they can hold ints, floats, strings, Booleans, or any other type of value.
Warning! Declaring a tuple with only a single element isn't done the way you might expect!
nums = (5) does not make a tuple; since parentheses are also used as mathematical or logical operators, that statement would just set
5. Python only interprets parentheses as enclosing a tuple if there's at least one comma inside (or if there's nothing inside). To set
nums to a one-element tuple, you could do
nums = (5,) - unnecessary trailing commas are permitted. Actually, you can even just write
nums = 5,.
You can also add tuples together:
>>> nums = (1, 2, 3) >>> more_nums = (4, 5, 6) >>> nums + more_nums (1, 2, 3, 4, 5, 6)
Something I struggled with when learning Python was trying to add a single element to a tuple like:
nums += 5. This would raise a
can only concatenate tuple (not "int") to tuple. Remember, since
var1 += var2 is shorthand for
var1 = var1 + var2,
nums += 5 is saying
nums = nums + 5. To add something to a tuple, the new addend has to itself be made into a tuple, like:
nums += (5,).
In fact, you can also multiply tuples:
>>> nums = (1, 2, 3) >>> nums * 3 (1, 2, 3, 1, 2, 3, 1, 2, 3)
But this is very rarely useful.
There is one difference in the way the
in operator works: with "real" sequences, like tuples,
in only tests if one of the members of the sequence after
in is equal to the element before
in. With strings,
in does "in a row" checking rather than "is a member" checking, so
"he" in 'hello' evaluates to
True, but with tuples,
('h', 'e') in ('h', 'e', 'l', 'l', 'o') or
(5, 3) in
(5, 3, 6) evaluates to
False, because none of the members of the tuple on the right is the tuple on the left. The reason for this behavior is that, as you may have guessed, you can have a tuple of tuples:
>>> high_scores = (("Alice", 1260), ("Bob", 1135), ("Carl", 1390)) >>> for score in high_scores: ... print(score, 'scored', score) ... Alice scored 1260 Bob scored 1135 Carl scored 1390 >>> high_scores # demonstrating double-indexing: high_scores is ('Bob', 1135) 'Bob'
Isn't that cool! Each element in
high_scores is a tuple that holds a name in position 0 and a score in position 1.
('Alice', 1260) in high_scores would evaluate to
True. (Strings don't have the concept of nested sequences in the way tuples do, so strings are the only sequence type that have the "in a row" behavior for
in instead of "is a member".)
This is also a good time to introduce a couple of minor features about line breaks.
When you need to break a statement across multiple lines, you're allowed to do so if it's enclosed in parentheses or brackets:
names = ( 'Alice', 'Bob', 'Carl', 'Dana', 'Elijah', 'Fiona', )
(The comma after the last element is optional, but I always add it when I'm doing multiline things like this. I find it looks nicer and makes it easier to change the order of the elements.)
But if it's not with parentheses or brackets, you need to use a backslash at the end of the line:
# This will raise a syntax error: #sentence = "The " + "quick " + "brown " + "fox " + # "jumps" # This works: sentence = "The " + "quick " + "brown " + \ "fox " + "jumps " + "over " + \ "the " + "lazy" + "dog"
You can also put two string literals together without the
+, and it will be assumed:
>>> print("hello" "friend") hellofriend
I don't recommend using this though. I find it less clear than using
+ and it's at most 2 characters shorter, and most other languages don't have it, so it's not a good habit to build. (It also only works on string literals, not string variables.) I honestly wish it wasn't in the language. It made me have to include this section to explain it, which costs both my time and yours.
So far, we've always put the block of an
while, or similar keyword indented under the condition, but if it's only one line, you can actually do this:
>>> if True: print("logic has not been broken") logic has not been broken
You can't nest them, though, even if they could theoretically all be on one line:
>>> for letter in "hi": if letter != 'h': print(letter) File "<stdin>", line 1 for letter in "hi": if letter != 'h': print(letter) ^ SyntaxError: invalid syntax
The most common time I use inline blocks is with
You should also be aware of semicolons. You can put multiple unrelated statements on one line by using a semicolon:
>>> a = 5; print(a) 5
You generally shouldn't, though, because it's less readable to have multiple, semantically distinct instructions on one line.
Another thing I'll talk about while we're on the topic of line continuations: Triple-quoted strings, enclosed on both sides with
''', are allowed to span multiple lines without a backslash.
message = """Incoming transmission: Hi, I hacked Yujiri's website and replaced his original example string with this! Plz don't point this out to him. I'm wondering how long it'll be before he notices. Also, I don't want him to plug his security hole :P"""
These are often used when you need to store a big message in a string, like help text for a command-line tool.
Quick trick: it's possible to assign two variables to the same value in one line without a semicolon:
>>> a = b = 5 >>> print('a is', a, 'and b is also', b) a is 5 and b is also 5
This feature isn't useful very often, but I should mention it.
Convention: capitalizing variable names¶
By convention, variable names in Python are all-lowercase, but there's an exception. Constants (variables that are meant to never change) are often written in all caps. I'm saying this because I'm going to use it in the following project, and don't want it to look weird.
With that, you're ready to write a much more fun program than you did in the last chapter.
The government hires you to write a program to automatically censor messages that contain politically unacceptable speech. There's a predefined set of words you're searching for (for the exercise, you can pick what they are). Your program must accept multi-line input (a blank line signals the end of the message) and then output: the message with all lines containing dirty words removed; and some metadata for the overseer of the censorship department, including an account of which unacceptable words were found in the message (all listed on one line, without the parentheses you get when you print a tuple, but with commas placed appropriately), and the number of lines removed.
For example, if the words to be censored are "oppress", "free", and "rebel", and the message is:
I wanted to do X today, but it was illegal. This is terrible. We shouldn't be prevented from doing X. We need to do something about our oppression. What if we rebelled? I don't know, maybe I'm just getting crazy ideas. It wouldn't work anyway, right? Oh well, I guess there's nothing we can do about it. I'll have to go do Y instead. I didn't want to do Y, though. I just wish we had more freedom.
The program should output:
I wanted to do X today, but it was illegal. This is terrible. We shouldn't be prevented from doing X. I don't know, maybe I'm just getting crazy ideas. It wouldn't work anyway, right? Oh well, I guess there's nothing we can do about it. I'll have to go do Y instead. I didn't want to do Y, though. words found: oppress, rebel, free lines removed: 2
This is supposed to be a fairly difficult project for someone with no programming experience outside of these three lessons. Give it some time. When I learned Python from the book that taught me, some of the end-of-chapter projects took me a few hours, but if you can solve a problem of this caliber on your own, then you're really catching on.
WORDS = ('free', 'liberty', 'tyrant', 'tyranny', 'oppress', 'rebel', 'revolt', 'revolution') caught = () removed = 0 message = '' while True: line = input() # empty line signals end of input if not line: break # since this is after we break out if the line was empty, # everything after this in the loop is dealing with a non-empty line. # next step: search for dirty words. censor = False for word in WORDS: if word in line: # if we find a dirty word, mark the line to be # censored, but don't break since we want to see # if there are any other dirty words in it. censor = True # add it to the list of words we've # found if it isn't already in there. if word not in caught: caught += (word,) if censor: removed += 1 if not censor: # re-add the newline, since input strips it message += line + '\n' print(message) # statistics words_caught_str = '' for word in caught: words_caught_str += word if caught[-1] != word: # avoid putting a comma after the last word. words_caught_str += ', ' print("words found:", words_caught_str) print("lines removed:", removed)
Next lesson: Mutable types