Previous lesson: Flow control
This time we're going through sequence data types, and all the kickass things you can do with them. Actually, you've kind of already seen a sequence type: strings. Strings are a sequence of characters. Technically, in Python, strings are a special case and different from other sequence data types, but you can still do a lot of sequencey things to them. Once you've seen a few, I'll introduce the
tuple data type.
The most basic feature of sequences is the ability to access a specific item inside them. This is done like this:
>>> "Hello" 'l'
A critical property of indexes is that they start at 0.
'e'. This is actually pretty common in computing.
You can also, of course, index with a variable.
>>> i = 1 >>> "Hello"[i] 'e'
Exercise: make a program that asks the user for a string, and then a number, and prints out the character at that position in the string. (For extra fun, do it in one line.)
print(input("enter a string:")[int(input("enter a number:"))])
This might look hard to parse - indeed, it's bad enough that a serious programmer might do it on multiple lines just for readability's sake - so I'll dissect it for ya. Assuming I enter
3, the code can be parsed like this:
input("enter a string:")is replaced with the string entered, leaving:
print('blah'[int(input("enter a number:"))])
input("enter a number:")is replaced with the next string entered, leaving:
int('3')is replaced with
You might've already thought to try this and figured out how it works, but what happens if you run
Negative indices start from the end. Note that this means they are not subject to zero-indexing, since -0 is the same as 0. 0 is the first element, 1 is the second element, -1 is the last element, -2 is the second-last. If you find this confusing, you're not alone :)
If you try to access an element that doesn't exist, such as
'blah', you'll get an error. This is a good time to introduce the handy
len function that can help you avoid this:
>>> len("hi") 2
Exercise: modify the string indexing program so that it won't crash if the user asks for an invalid index, but print a message instead.
Another super cool feature of sequences is slicing: the ability to index a range of elements at a time.
>>> 'pizza'[1:3] 'iz'
It gives us a string that starts at position 1 and ends at position 3, giving us characters #1 and #2. (You can think of this like a slice is always from and including the start position and up to but not including the end position.)
If you omit one or both numbers of the slice, it goes to the beginning or end:
>>> 'pizza'[2:] 'zza' >>> 'pizza'[:2] 'pi'
'pizza'. An omitted start position is the same as
0, but an omitted end position is not the same as
-1 is the last item, so slicing up to
-1 cuts it out.
Okay, this is a rather obscure feature, but I might as well demonstrate it while I'm talking about this. You can have a third number inside the slice brackets, which specifies the "step" size:
>>> 'abcabcabcabcabc'[::3] 'aaaaa'
This slices from the beginning (because start position is omitted) to the end (because end position is omitted), selecting only every third character. You can think of the step size as defaulting to 1.
Jargon: iterate: to loop with a sequence and do something with each element inside it. It can be used with either "on" or "over" as a preposition.
Exercise: use your knowledge of loops and indexing to write a program that gets a string from the user and then prints out each character inside it on its own line. (I'm about to introduce an easier way of doing this, but I want you to see how it can be done without it.)
string = input("give me a string:") index = 0 while index < len(string): print(string[index]) index += 1
Note that I couldn't put the
input("give me a string:") that defines the variable
string inside the condition like
while index < len(input("give me a string:")):. If you tried to solve this yourself before looking at the solution (which you should have), you probably ran into this, but the reason that doesn't work is that a
while loop's condition is evaluated every time it's checked, since it has to know when to stop. So every time it loops, it would ask, is
index < len(input("give me a string:"))?. And every time it asks that, it would execute
input("give me string:") to find out what its value was, which means the user would be asked to enter a new string after every iteration of the loop. The solution was to execute
input("give me a string:") once at the beginning, and store the value, so that when the
while loop evaluates its condition every time, it's only asking whether
index is less than the length of
string is the result of
input("give me a string:"). This way, it doesn't ask the user for a new string every time.
One of the most important keyword related to sequences:
for is an alternate loop construction that makes iterating on a sequence much easier:
for letter in input("enter a word:"): print(letter)
Note that with
for, the expression that tells it the sequence to be iterated (in this case, the result of
input("enter a word:")) is only evaluated once, and then it just internally runs the loop with
letter set to each character in that string. So with
for it's safe to put the
input in the
Another problem you can solve now: make a program that gets a string from the user, and then a letter, and determines whether the letter is in the string.
string = input("give me a string:") char_to_find = input("give me a single character:") found = False for char in string: if char == char_to_find: found = True if found: print(char, 'is in', string) else: print(char, 'is not in', string)
Yes, the problem I just made you solve was another unnecessary one :P You can use
in outside of the context of
for to test whether something is inside a sequence:
>>> 'e' in 'Hello' True >>> 'x' in 'Hello' False
Well isn't that neat! I just wanted you to solve this problem the hard way as an intellectual exercise, and because many other langugaes don't have this keyword or anything equivalent to it. (C doesn't; Go only has it for strings, but not for other sequence types.)
Additionally, on strings,
in works with multi-character substrings. Check this out:
>>> 'He' in 'Hello' True >>> 'eH' in 'Hello' False
Testing whether a multi-character string is inside of another string manually is a nightmare compared to this. (If you want, take a stab at it.)
not in works the way you expect, even though, technically, you should expect it to be
not (x in y) (which does also work). After all,
x not > y is a syntax error. Basically, it's like
not in is an operator in its own right.
Now that we're iterating on stuff, it's a very good time to introduce two handy keywords used in loops: the
break statement, which exits the loop immediately even if its condition is still true, and
continue, which skips the rest of the current iteration, and continues from the top of the loop. Here's a demo of both:
number = 0 while number < 10: number += 1 if number == 5: # skip 5 for no reason continue print("the next number is", number) if input("want to see another? (y/n)") == 'n': break
Tuples are a more general sequence data type. They store an arbitrary list of arbitrary values. The syntax for tuple literals is to enclose them in brackets and separate elements by commas:
>>> nums = (6, 1, 4) >>> nums (6, 1, 4) >>> nums 6 >>> for num in nums: print(num) 6 1 4 >>> greetings = ("Hi", "Hello", "Good day", "Salutations") >>> greetings "Good day" >>> for greeting in greetings: print(greeting) Hi Hello Good Day Saluations >>> print(greetings[:2]) ('Hi', 'Hello')
As you can see, tuples are subject to indexing, slicing, and the rest of the bag the same way strings are, but they aren't limited to holding strings; they can hold ints, floats, strings, Booleans, or any other type of value.
Warning! Declaring a tuple with only a single element isn't done the way you might expect!
nums = (5) does not make a tuple; since parentheses are also used as mathematical or logical operators, that statement would just set
5. Python only interprets parentheses as enclosing a tuple if there's at least one comma inside (or if there's nothing inside). To set
nums to a one-element tuple, you could do
nums = (5,) - unnecessary trailing commas are permitted. Actually, you can even just write
nums = 5,.
You can also add tuples together:
>>> nums = (1, 2, 3) >>> more_nums = (4, 5, 6) >>> nums + more_nums (1, 2, 3, 4, 5, 6)
Something I struggled with when learning Python was trying to add a single element to a tuple like:
nums += 5. This would raise a
can only concatenate tuple (not "int") to tuple. Remember, since
var1 += var2 is shorthand for
var1 = var1 + var2,
nums += 5 is saying
nums = nums + 5. To add something to a tuple, the new addend has to itself be made into a tuple, like:
nums += (5,).
There is one difference in the way the
in operator works: with "real" sequences, like tuples,
in only tests if one of the members of the sequence after
in is equal to the element before
in. With strings,
in does "in a row" checking rather than "is a member" checking, so
"he" in 'hello' evaluates to
True, but with tuples,
('h', 'e') in ('h', 'e', 'l', 'l', 'o') or
(5, 3) in
(5, 3, 6) evaluates to
False, because none of the members of the tuple on the right is the tuple on the left. The reason for this behavior is that, as you may have guessed, you can have a tuple of tuples:
>>> high_scores = (("Alice", 1260), ("Bob", 1135), ("Carl", 1390)) >>> for score in high_scores: ... print(score, 'scored', score) ... Alice scored 1260 Bob scored 1135 Carl scored 1390 >>> high_scores # demonstrating double-indexing: high_scores is ('Bob', 1135) 'Bob'
Isn't that cool! Each element in
high_scores is a tuple that holds a name in position 0 and a score in position 1.
('Alice', 1260) in high_scores would evaluate to
True. (Strings don't have the concept of nested sequences in the way tuples do, so strings are the only sequence type that have the "in a row" behavior for
in instead of "is a member".)
This is also a good time to introduce a couple of minor features about line breaks.
When you need to break a statement across multiple lines, you're allowed to do so if it contains commas:
names = ( 'Alice', 'Bob', 'Carl', 'Dana', 'Elijah', 'Fiona', )
But if it's not with commas, you need to use a backslash at the end of the line:
# This will raise a syntax error: #sentence = "The " + "quick " + "brown " + "fox " + # "jumps" # This works: sentence = "The " + "quick " + "brown " + \ "fox " + "jumps " + "over " + \ "the " + "lazy" + "dog"
You can also put two string literals together without the
+, and it will be assumed:
>>> print("hello" "friend") hellofriend
I don't recommend using this though. I find it less clear than using
+ and it's at most 2 characters shorter, and most other languages don't have it, so it's not a good habit to build. (It also only works on string literals, not string variables.) I honestly wish it wasn't in the language. It made me have to include this section to explain it, which costs both my time and yours.
So far, we've always put the block of an
while, or similar keyword indented under the condition, but if it's only one line, you can actually do this:
>>> if True: print("logic has not been broken") logic has not been broken
You can't nest them, though, even if they could theoretically all be on one line:
>>> for letter in "hi": if letter != 'h': print(letter) File "<stdin>", line 1 for letter in "hi": if letter != 'h': print(letter) ^ SyntaxError: invalid syntax
The most common time I use inline blocks is with
You should also be aware of semicolons. You can put multiple unrelated statements on one line by using a semicolon:
>>> a = 5; print(a) 5
You generally shouldn't, though, because it's less readable to have multiple, semantically distinct instructions on one line.
Another thing I'll talk about while we're on the topic of line continuations: Triple-quoted strings, enclosed on both sides with
''', are allowed to span multiple lines without a backslash.
message = """Incoming transmission: Hi, I hacked Yujiri's website and replaced his original example string with this! Plz don't point this out to him. I'm wondering how long it'll be before he notices. Also, I don't want him to plug his security hole :P"""
These are often used when you need to store a big message in a string, like help text for a command-line tool.
Quick trick: it's possible to assign two variables to the same value in one line without a semicolon:
>>> a = b = 5 >>> print('a is', a, 'and b is also', b) a is 5 and b is also 5
This feature isn't useful very often, but I should mention it.
Convention: capitalizing variable names¶
By convention, variable names in Python are all-lowercase, but there's an exception. Constants (variables that are meant to never change) are often written in all caps. I'm saying this because I'm going to use it in the following project, and don't want it to look weird.
With that, you're ready to write a much more fun program than you did in the last chapter.
The government hires you to write a program to automatically censor messages that contain politically unacceptable speech. There's a predefined set of words you're searching for. Your program must accept multi-line input (a blank line signals the end of the message) and then output: the message with all lines containing dirty words removed; and some metadata for the overseer of the censorship department, including an account of which unacceptable words were found in the message (all listed on one line, without the parentheses you get when you print a tuple, but with commas placed appropriately), and the total number of characters that were removed.
Additionally, the filter should impose a message length limit of 280 characters (including the newlines, including the one at the end of the last non-blank line). If a message is longer than that, it should be cut off and terminated with "..." so that exactly the first 280 characters (including the added
...) of the message are outputted. In either case, a blank line should be printed between the message and the statistics.
This is supposed to be a fairly difficult assignment for someone with no programming experience outside of these three lessons. Give it some time. When I learned Python from the book that taught me, some of the end-of-chapter projects took me a few hours, but if you can solve a problem of this caliber on your own, then you're really catching on.
WORDS = ('free', 'liberty', 'tyrant', 'tyranny', 'oppress', 'rebel', 'revolt', 'revolution') caught = () removed = 0 message = '' while True: line = input() + '\n' # re-add the newline (which input leaves out) so we count it as a character # empty line signals end of input if line == '\n': break # since this is after we break out if the line was empty, # everything after this in the loop is dealing with a non-empty line. # next step: search for dirty words. censor = False for word in WORDS: if word in line: censor = True if word not in caught: caught += (word,) if censor: removed += len(line) else: message += line # truncate the message if len(message) > 280: # we have to re-add the newline that we cut off so there will still be # a blank line between it and the statistics. this means cutting off # 4 characters, not 3. print(message[:276] + '...\n') removed += len(message) - 280 else: print(message) # statistics words_caught_str = '' for word in caught: words_caught_str += word if caught[-1] != word: # avoid putting a comma after the last word. words_caught_str += ', ' print("words found:", words_caught_str) print("characters removed:", removed)