Skip to content
Home » University of Michigan » Python for Everybody Specialization » Python Data Structures » Week 1: Chapter Six: Strings

Week 1: Chapter Six: Strings

Building on Chapter 6, this class dives into Strings and data structures. Week 2 helps you install Python for hands-on practice, while Week 3 lets you jump ahead if you’re ready.


Welcome


Video: Video Welcome – Dr. Chuck

  • This page is part of the Python Data Structures course.
  • It covers chapters six through ten of the textbook.
  • The instructor encourages students to install Python for future classes, as the programs become more complex.
  • If you’re struggling, it’s recommended to go back and take the previous class.
  • The goal of the course is for students to feel mastery and understanding of the material.
  • The assignments in the first ten chapters should not be very difficult.
  • The instructor looks forward to seeing students and congratulates them on their progress.

Hello and welcome to my class
on Python data structures. This is the second class in
our Python specialization. It covers chapters six
through ten of the textbook. The previous class covered
chapters one through five. This is the last class in
the specialization that you can avoid installing Python. All along we’ve encouraged
you to install Python. Do all your homework on your
desktop or laptop, and then use our autograder
to turn things in. But starting with the next class,
not this class, but starting with the next class you’re
going to have to install Python. Because the programs get more complex. We’re using advanced Python features
that I can’t simulate in a browser. And so, I would say this is the time
to go ahead and install Python. Especially if you want to go on
beyond chapter ten in the book. So, if you are jumping into this class
and you find yourself struggling, like whoa, I’m picking this
up in the middle, go back and take the previous class. We just assume that you have
mastered the previous class. One of my whole goals in
design of this class, is for you not to try to rush through it,
but to feel a mastery of it. And feel free to do the same problem
over and over and over again, and hide the solution from yourself,
until you realize, oh wait a sec. This is easy. Because frankly,
up to this point and going forward,
none of this should be very difficult. And you should really understand
every single thing that’s going on in every one of the assignments. Especially in the first
ten chapters of the book. Later the programs become a little more
complex and even by the end of the book, in chapter 14 and 15. It takes me hours to
write these applications. And so, but here in chapters one
through ten, it doesn’t take me hours. It should take me, you know, five to
ten minutes at the most. And it eventually should take you
five to ten minutes at the most. So, welcome to the class. I look forward to seeing you and
congratulations on getting this far if you made it through the first class

Materials


Reading: Textbook

Reading

Lecture materials


Video: 6.1 – Strings

Summary of Chapter 6: Strings in Python

Key Points:

  • Last chapter for basic string manipulation.
  • Focus on understanding, not practical application yet.
  • Strings are sequences of characters indexed from 0.
  • Index operator ([]) extracts characters by position.
  • len(string) returns the length of a string.
  • Looping through strings with for and in:
    • More convenient than manually constructing loops with index.
    • Iterates through each character in the string.
  • Slicing strings with [] and expressions:
    • Grabs a specific section of the string with start and end values.
    • End value is not inclusive (“up to, but not including”).
    • Useful for extracting prefixes or suffixes.
  • Next: More advanced string manipulation techniques.

Additional Notes:

  • The speaker uses humor and personal anecdotes to illustrate concepts.
  • He emphasizes simplicity and conciseness in code writing.
  • Comparisons are made to algebra concepts for clearer understanding.

Welcome to Strings in Python!

Strings are everywhere in Python, from printing simple messages to building complex applications. This tutorial will equip you with the skills to manipulate and analyze strings like a pro. Are you ready? Let’s dive in!

1. What are Strings?

Imagine a box filled with letters. That’s basically what a string is in Python! It’s a sequence of characters enclosed in single (”) or double (“) quotes. For example, "Hello, world!" is a string containing 13 characters.

2. Accessing Characters:

Just like picking out a specific toy from a box, you can access individual characters in a string using index numbers. These numbers start at 0, so "Python"‘s first character (P) is at index 0, and the last (n) is at index 5. You can use square brackets around an index to extract the character at that position, like this:

Python

my_string = "Python"
first_letter = my_string[0] # first_letter will be "P"
last_letter = my_string[5] # last_letter will be "n"

3. String Operations:

Python offers a variety of built-in functions to manipulate strings:

  • Concatenation: Joining strings together, like "Hello" + " world!" becomes "Hello world!".
  • Length: Finding the number of characters, like len("Python") returns 6.
  • Uppercase/Lowercase: Converting to uppercase ("python".upper()) or lowercase ("PYTHON".lower()).
  • Finding Substrings: Searching for a substring within a string, like "apple pie".find("pie") returns 7.
  • Replacing Characters: Replacing specific characters with others, like "banana".replace("b", "d") becomes "danana".

4. Looping through Strings:

To process each character in a string, you can use a loop:

Python

for letter in "Python":
    print(letter) # Prints each letter on a new line

5. Slicing:

Extracting a portion of a string is a breeze with slicing! Use square brackets with colon-separated indexes, like "Python"[2:4] to get “th”. Remember, the second index is not included.

6. Putting it all Together:

Now that you know the basics, try combining these skills to solve fun tasks. For example, write a program that:

  • Takes a user’s name and prints a greeting message.
  • Counts the number of vowels in a given string.
  • Reverses a string to see if it’s a palindrome (reads the same backward and forward).

Bonus Tip: Explore online resources and practice on interactive platforms like Jupyter Notebook to solidify your understanding.

Remember: Practice makes perfect! So grab your favorite Python environment and start experimenting with strings. Feel free to ask questions and explore further. The world of strings in Python is waiting for you!

This is just a starting point. You can expand the tutorial by covering more advanced topics like formatting strings, regular expressions, and working with text files. Good luck and have fun!

Hello and welcome to Chapter 6. Now we’re going to talk about strings. This is really the last chapter
that I’m just going to ask you to please learn something without
exactly knowing how to do it. We’re just sort of chopping food. It’s like you’re going to be a
chef eventually, but we’re just chopping food. So Chapter 6 is the last chapter that you
just have to learn how to chop food. We’re going to actually make a
meal in Chapter 7. Once we have a file, then
all of the things that we’ve learn how to do are
going to come into play. So just trust me and listen
for one more chapter. So we literally have been using strings from the very first moment because the first thing
we did is print Hello world, and so, you know, this is a slide from a
couple of lectures ago. And so, you know, we take two strings, double quote, single quotes, we use the plus, remember it looks to the left, looks to the right, concatenates, remember that doesn’t put a space there. And here’s a string that has digits. And now we’re going to try to add 1 to it,
and it blows up. Yeah. You know, it’s hardly even – you’re not really sad when you see traceback
by now, hopefully. You’re just like, oh, traceback’s
a normal thing. I’m trying to learn. TypeError: cannot concatenate
strings and integer. It’s trying to tell you what’s going on.
And we’re all good. We then take the string, we pass it to the int function, and then that comes back with 123, and we add that and it becomes 124. So it’s all good, right? It’s all good, and we’ve been
doing that for a while. Another thing we’ve been doing is
reading data from input. The input function prints out a prompt.
That’s a prompt. We type something, and then whatever that is
comes back as the result of the function, and then it gets stuck into name, and so if we print that – print, of course, is also a function — we pass name in,
and we get out Chuck. Even if we enter some numbers, like 100, right, that doesn’t make apple a integer.
Apple is a string. Input gives us back a string, so we can’t subtract 10 from it.
Traceback again. But if we can convert it to an integer
and then subtract 10, then we can get that 100 minus 10
becomes 90. So we’ve been manipulating strings and using internal functions and converting them
to floats and doing this, that, and the other thing
as we have gone forward. But now, we’re going to start
tearing apart strings. So, the ultimate thing we’re going to do
is read through a bunch of data, tear that data apart, read line by line, and then look at
each line and find things in the line. So we need to know that a line of
characters, many characters, which turns into, would be a string, a multi-character string, has
indexable data within it. So, the string banana – and
I didn’t come up with banana. Actually, the book that I use is
based on a book by two people, Allen Downey and Jeff Elkner, and one of those two came up with banana. I would never have come up with banana
because I don’t know how to spell banana, and I’m terrified of having
a slide or the book with a mistyped banana because I just think somewhere in banana there’s
supposed to be two n’s. But, and I read this and it looks
like a misspelling to me, but I’m pretty sure that’s right. But that’s neither here nor there. We have this string banana, which is six characters,
and we stick it in fruit. And if we look in fruit, you can actually pull each character out. We call this the index operator, and the square brackets are
the index operator. And I pronounce this sub, so that’s fruit sub one. Now, as we look at the index, the first one is zero. Now that’s counterintuitive –
it goes right back to elevators in Europe that have
zero as the first floor. Right? Zero’s the first floor, so Python was invented
in the Netherlands – that’s Europe – and so
all the elevators are zero, so the first thing is zero. Actually, that’s not the reason at all. The reason has to do with
performance in computer science, where zero is easier to add than
subtracting one, but whatever. Just remember: the first thing,
the second thing, the third thing, fourth thing
is sub zero. So this is a six-character string, but the last position is position five. You’ll get it; it won’t take you long. It will seem natural pretty soon. Right now, it seems unnatural. So fruit sub one, that means the character in position one. So, a ends up in letter, and so we indeed can verify that. This thing inside of the brackets
can be an expression, it can be a variable, it can be anything you want. There’s a constant,
here’s an expression. x is 3, x minus 1, that becomes 2, so fruit sub two comes down here to x
and we see the n that comes out of that. So, that’s the index operator. We pronounce it as sub, you know, fruit sub x minus one is how I
pronounce that last little bit. And so, it wouldn’t be Python
if we didn’t have a traceback, and in this one, I’m making a mistake. And that mistake is I’m going beyond abc, which is zero, one, and two. And so, zot sub five. No, sorry. Python is angry at us. Python is angry at us, and so we get an index error.
IndexError: string out of range. Oh, well, I mean string index – well
this is the string, that’s the index. That’s the word. We’re doing an index operator, a look-up operator, or a sub operator. So, that’s just a thing
you’re not supposed to do. After a while, you kind of get used to the idea that Python is just going to
traceback on you from time to time. There is a number of functions. We can pass a string into the len function, and we can get the length. The length of this is six characters. It is indeed six characters; even though it’s zero through five, it’s still six characters. So the len is just another function, we talked about functions before. Functions take as input some parameter, so fruit is assigned into banana,
and then we’re doing this. Remember, we evaluate the right-hand side here. This fruit gets passed into len, so the string banana is passed
into the len function. The len function does something
in the middle of it, and then the len returns us a 6
with the return statement, and then that 6 goes – that’s
an integer 6 – goes to x, so then we print it out
and we get the 6. Okay, so len is a function, takes
an input parameter, and away we go. And so inside of len, there is some code that takes this. It’s got a for loop or
who knows what’s in here, and then it’s got a return statement, and then it returns the stuff which then replaces this as the residual value
in the expression, and then the assignment statement finishes, and then 6 ends up in x, and away we go. So, there’s lots of things that you can do
with strings. Asking how long they are is one of the things that you
can do with a string. Now, we want to loop through a string. Well, given that we can have this
index operator sub, we can then generate a sequence of
numbers zero, one, two, three, four, five, and then we can
look up all of the things, right? And so to do that, you know, we got fruit banana and then
index – this is our iteration variable – and we’re
going to construct a loop, where we’re going to add 1, we’re going to increment index by 1. And then we’re going to say while index is
less than len of fruit – and that’s 6, not 5. And this will give us the numbers
0 through 5. So this loop will run with index
being 0 through 5. So the first time through it’s 0, the second time through it’s a 1, then a 2, and then we’re going to take the sub zero letter
and stick it in the string letter. Sorry, letter is a bad choice of a variable. This could be x, as long as this were x,
it doesn’t matter. It’s just letter is a reasonably mnemonic variable unless you’re
trying to give a lecture. The letter is letter. The
letter gets assigned into letter. So if I just said x, x, then I would say it looks up
the letter at the position zero and then puts
that letter into the variable x, and then we print out the variable x. Sometimes mnemonic, but there you go. So that’s going to run six times zero, one, two, three, four, five, six, and each time it’s going to print out the index and the letter that happens to
be in that string at the index. So now we’ve got a loop that goes through
each of the letters in a string. Now, that was the indeterminate loop. We had to construct it. We had to make our own iteration variable, etc. etc. etc. But a much more convenient way, unless you actually need to know the position, just if you want to go through
all the letters in a loop, a much more convenient thing to do is
just use a for – a determinate loop. Right? So, we’re going to use for and in. And remember, in is like, you know, member of, for all the letters in the
set fruit, but in this case, it’s for the iteration variable letter taking on all the successive values
of the characters of fruit, so letter’s going to be b, then a, then n, then a, then n, then a. And that means it’s going to run
this loop six times, and each time through, letter is going
to be something different, and so it just prints this out. And we didn’t have to construct any of that
index stuff or any of the fancy stuff, we just rock and roll our way right through that. Here’s those two loops that I just showed you. Right? Here is the determinate loop
with the for and the in; and it’s nice and clean and they produce it. Here we construct index, have the while loop, use len, add 1 to index, pull the letter out, and so this line is the same as that line. So this is kind of like five lines
of code or four lines of code, and this is like two lines of code. And this might not seem like much, but there’s so many places that
you can make a mistake here, you know, if this is like
index + 2 or something. Now you do have more flexibility when you’re constructing it this way and sometimes
you do have to construct it, but to do the exact same thing, these two things are doing the exact same thing, and so it’s always better to use a
more succinct and direct way of describing your code rather than this more this is like showing off how good
you are with the while loop, but it’s sort of unnecessary. So use the simplest bit of code that you
can use to accomplish what you want. It’s easier for you to write, it’s easier for you to debug, and it’s easier for someone else to
understand as they’re reading your code. So we can go back to the iteration chapter
and think of all the things that we did, whether it was look for the largest, look for the smallest, see if something’s there. What this is going to do is
this is a simple loop that’s going to go through and see how
many a’s are in a word. Now we happen to know by looking at it, but it gives you the sense of of iteration. So we take, you know, letter’s going
to to take on b a n a n a. It’s going to run this code six times. And if the letter is an a,
we do count = count + 1. We set it to 0 at the beginning. Remember how these loops do something at the
beginning, they do something in the middle, and then they have kind of like
the payoff at the very end. And so this just means every time
the letter’s a, we’re going to add 1 to count, so this effectively is counting the
number of a’s in the word banana, and out comes 3 because there are 3 a’s. Now if I misspelled it, there
would be more n’s and more a’s, but luckily on this slide, I think it’s spelled correctly. Now, I love this in, and we’re going to use this to do a lot of things when we deal with
files, when we deal with lists. This idea that in is kind of like this membership notion in algebra.
Not that you have to know algebra, but if you do know algebra, it’s like for x such that it’s
a member of this set. That’s the concept of in. It’s a very clean abstraction. Maybe you’ll actually learn Python, and then you go back and learn algebra
and you go’ll like, Oh, yeah! This little member guy, that’s kind of like an in statement,
in statement in for. It’s a very abstract concept
that really says this is how we’re supposed to just run
this loop six times, you know, one, two, three, four, five, six, do it. Take care of all of the small details for me. Right? And this is again for me is
the magic of the for loop, it’s the Python for loop is – the for itself, the for loop does a couple of things. It decides how long the loop’s going to run, when the loop starts, when the loop stops, and it advances the iteration variable
automatically, so it decides, am I done? Go get the next letter, run it. Am I done? No, go get the next letter, go get the next letter, go get the
next letter, go get the next letter. Oh, now I’m done and I’m going to quit. Right? And so, the for takes all of this, all that logic is in one statement. And like I said, the less code that you have to write, the better off that you are. Now that I showed you how to
loop through strings. I want to show you ways that you
don’t have to loop through strings. And so, one of the things you do in strings is you basically want to grab a piece of the string. And so this is what we call slicing. And we’re going to use the same
square bracket to do slicing, except that we’re going to put an expression
in that tells us how far to go. So here we have a string, 0 through 11. Remember they start at 0. And so, in here, instead of saying s sub 0, which would be the first character, we say s sub 0 : 4. And so this gives us a range. Now, the key thing here is the end is
up to but not including. OK, up to but not including. So when we say 0, start at 0 and go up to but don’t include 4, that says up to but don’t include 4. So we don’t include 4. Now that, again, may seem counterintuitive, kind of like zero starting is counterintuitive, but I’ll bet you’ll see that
there are times when it sort of makes sense to do up to but not including. So for now, just remember up to but not including. So if we go 6 to 7, well, 6 starts here, and then up to but not including
doesn’t include the 7, so that’s why we get a capital P. And then, if we do 6 through 20, 6, starting at 6 going up to –
you’d think this would be a traceback, but it’s not a traceback. It is okay. After a while, you’re like, “I’m a little
disappointed in you, Python. You’re supposed to traceback
every time I make a mistake.” Well, somebody decided it was okay to
reference beyond the end of a string. And we’ll forgive you, it’s not going to get anything,
it’s actually going to stop there, and that’s why we get Python
as the answer here. Now, given that the beginning and
the end of the string are a very common thing you want, you want
a prefix or a suffix off of this string, it’s really common to either eliminate
the first character, which means beginning of string, or eliminate the second part of the range, which means the end of the string. So this basically says 0
up to but not including 2, so that’s Mo. And this one says 8 through the end,
which means thon. And then you can eliminate them both
and so it means the whole thing. Why do you want to do this? I don’t know. I say it’s syntactically there
just for completeness. So up next, we’re going to continue learning
how we can manipulate strings.

Video: 6.2 – Manipulating Strings

Summary of the Lecture on Strings in Python:

Key Points:

  • Strings are sequences of characters enclosed in quotes.
  • Basic operations include concatenation, length, upper/lowercase conversion, finding substrings, and replacing characters.
  • in operator checks if a substring is present in a string (returns True/False).
  • Comparison operators (like ==) work for strings based on character order.
  • The string library offers additional capabilities like lower(), replace(), and find().
  • dir() and type() reveal methods and classes associated with strings.
  • Slicing with [] extracts specific portions of a string based on index positions.
  • Whitespace includes spaces, tabs, and newlines (can be removed with strip()).
  • startswith() checks if a string starts with a specific prefix.
  • Combining these techniques like finding and slicing allows for complex string manipulation.
  • Python 3 uses Unicode strings for wider character set representation compared to Python 2.

Additional Notes:

  • The lecture uses interactive demonstrations with Python code.
  • It emphasizes building practical skills for manipulating strings in real-world applications.
  • Future chapters will focus on using these techniques for more advanced tasks.

Overall:

This lecture provides a comprehensive introduction to string manipulation in Python, laying the foundation for more complex string handling in future chapters.

So now, we’re going to do some
more things with strings, because string manipulation is
a lot of programs. Number manipulation is one
kind of program and string manipulation is generally another
thing that we do in programs. So let’s just…
the +, we’ve been doing that, where it sort of looks to the left, looks to the right, concatenates. Remember, there is no space in here. If you say, like, print(x,y), the two things come out and there’s
a space in between them. But that’s not what’s going to happen here. This + says concatenate these two things
and it literally does it. So if you want to put a space in between,
you have to say, you know, a concatenated with a space concatenated with There and so we’ve explicitly put the space in. So, string concatenation truly concatenates
the strings together. If we added the space automatically then
you’d need a way to suppress that behavior. So you’d need some other operation that
concatenated strings without a space. So we just say we’ve got one way of doing it, and if you want a space, put the space in there. The in, which is, I love so much in the for loop, is also usable as a logical operator. The expression is a little bit different. And so, instead of like, you know, in, sort of variable in, it’s like this is asking that question. And it’s very much in a way like, you know, double equals, which is a question,
or not equals, you know, these are leading back,
giving us back True/False or like less than or less than or equals,
or something like that. These are all questions. Is this true? Yes or no? And we use them in if statements. And so, here we make the variable fruit, and we ask the question:
Is the string ‘n’ in fruit? And so, in is an operator here. It’s like ==, but it’s really looking through
and saying, is the letter n in fruit? And the answer is yes, it is, ’cause there’s that n there, and so we get back a True. Is ‘m’ in the contents of the variable fruit? The answer is no, so we get back a False. And now it doesn’t have to be a
single character. We can ask for a substring, and say, is ‘nan’ inside fruit? The answer is yes, there is, and so we get a True back. And so it’s pretty smart. It can scan, it finds whether or not
these things are in there. And we tend to build these things
where we use them in if statements. You know, if ‘a’ is in fruit, then print Found it! So in this case, that’s an expression
that evaluates to True because a is in fruit and so
this code executes and away you go. Just a little note, if you’re using
the interactive interpreter, and you’re using – you actually
have to throw a blank line here. You don’t need a blank line in real Python,
like if you’re writing in a file. But, you know, if you type this
and then you indent that, it works, and then you hit enter here, then it will actually run that
whole block of code. You have to give this blank line
to convince it. It’s a situation where the interpreter,
the chevron prompt, is slightly different than a Python
syntax in a Python script. No biggie, but I bet by now you’ve
probably figured that out. OK, you can compare strings. They make a lot of sense. Equal sign, you know it just, you know, compares character for character. Less than and greater than have to do with the character set of your computer and the character set that Python
is configured to use. So less than, if you recall, we did max and min, and we learned that the uppercase letters
are generally less than lowercase letters. Right? So uppercase Z is less than lowercase a. And that’s going to happen if you do
upper, greater than or less than. Now, the thing that works, for example, that is, you know, consistent, is if you have something like Chuck with
uppercase C and Glenn with uppercase G, oops, and lowercase everything else,
it’s going to sort right, because all the uppercase letters sort the right way and the lowercase
letters sort the right way. And, but chuck and Glenn, chuck and Glenn will sort the
wrong way because the G is going to sort before the c, and so… But, it’s ok. It makes some sense,
it’s all consistent. But you can do this. But certainly
== works just peachy fine. So, those are the sort of the
basic operations we can do, but there’s a whole bunch of
additional capabilities that are part of what we call
the string library. And it has to do with the idea
that strings are objects. And later, we’ll learn what objects are
and learn a lot of stuff. Just for now, objects are these
kinds of variables that have capabilities that are kind of
grafted onto or built into them. And so, inside, once we put a string in greet, Python knows that’s a string. If you use the type command, it would say, oh, class str. That str confers certain benefits and
privileges that strings are capable of doing that are different
than what integers can do and different than what files can do
and other kinds of types. Right? And one of the things they can do is, you can say greet.lower(). Now, it’s almost like saying, like calling a function called lower()
and passing greet into it. But this is slightly different syntax. This is, this is run a function lower()
that’s part of the string object, of the string class, that is going to give us back a lowercase copy. So what this functionally does, it says, make a copy of greet but all lowercase
and return it to us, and then we’re going to store that into zap. And so, if we print this out, it’s basically all lowercase. And if we take a look at what’s in greet, we see the greet is unchanged, because this was make a copy that’s
lowercase, a lowercase copy. And so it doesn’t change the original. And even a constant is a legit object, and we have it, a lower method inside of this. So this just prints “Hi There”.lower() which gives us ‘hi there’ all lowercase. So we’re good. OK? And so even constants have this
sort of built-in capability. When we get there, we’re going to call,
you can look this up, look at object method, method. This thing is a method. Or you can look ahead in an upcoming
chapter and figure out what methods are or you can just look
on the Internet to figure out what methods are. And we will cover this in much greater
detail when we get there. This is foreshadowing. Now these are the dir and the type are
things we’ve done before. And so the type says it’s a type str. Class is an object-oriented term that basically says this is a thing that’s
of the category string. And now dir says, what are strings capable of? And there’s actually a bunch of things.
I don’t show them all. But these are a bunch of methods
in the class str. This is a light version of
an object-oriented lecture. And these are just the things you can do. So it’s stuff dot blah blah blah (). OK? replace, rfind, rstrip, they’re all here. There’s a whole bunch of them. And, this is, dir is not the best
documentation for these things, but Python, of course, has wonderful
online documentation that explains it. So they tell you what the parameters are, and str is whatever variable like, you know, x or y. You know, and y.replace() blah old new, y.rjust() for justification, or split. We’ll play with lots of these things. So we’ll take a look at some of the
more common things that we do with the string library. Capitalize, which takes a string like, you know, abc, and makes the first letter Abc, or you could even have ABC as input
and then output makes Abc capitalized. Whenever it’s done, the first
letter’s capitalized. That’s what capitalize does. Why you want to do that? Whatever. It’s already built in. You could write a for loop to do that, but it’s already built in, so we’ll take a look at some of these things. One of the most common things that we do
is use the find operator. And it’s kind of like in, except that instead of returning True/False, it returns where it found it. So, the in says, is the ‘na’ inside banana? Or, find says, where inside the banana is ‘na’? So, we say fruit.find(), it’s a method within strings, and we pass in ‘na’, and then Python goes looking through
here and says, oh, there’s an na right there, starting in position 2. Now, it doesn’t say there’s a bunch of them. Later we’ll figure out how if you
really want to find a bunch of them, you can call regular expressions.
More foreshadowing. But you find the first one and
it comes back with a number 2. So, the position is where the na is
positioned within the banana is position 2. But that’s actually the third letter,
so don’t forget that. If you look for something that’s not there, no z, you get back -1, so that’s our little indicator, or our flag that basically says,
did not find it. OK? So that’s the find operation. We already have played with the uppercase
and the lowercase. There is an upper that is effectively shouting. Remember that greet doesn’t change. There’s a lower that goes all to lowercase. Sometimes, I tend to use these when I don’t
exactly know and I want to do an if test to say if here’s a string and here’s
a string and I want to ignore the case, I say if the string to lower, if the string lower is equal to the other
string lower then I know that they’re both lower. That they match, ignoring case. Search and replace. So this is an example where we have, you know, Hello Bob is in this variable. And we’re going to call the
replace method inside of the greet variable and give it
two parameters. In this case, we’re going to give it
an old and a new. So that says go find all the
Bobs and replace them with Janes. It doesn’t hurt greet,
greet doesn’t change, it gives us back a copy and in that copy, all those characters are replaced, so the, what’s in nstr is Hello Jane. We can replace all the o’s with X’s. So that goes here, and it replaces that and replaces that. Of course, greet’s unchanged, but then we get a copy of it with the o’s
replaced with X and put that in nstr, and so that’s where we get
Hello Bob with X’s instead of o’s. And it just shows that
this is a sort of multi-replace. It goes through and finds all of
the o’s and replaces them. And there was only one Bob,
but it would have found, if there was more than one Bob, it would have fixed all the Bobs
and changed them to Janes. Whitespace is something that we see. The best way to think of whitespace
is it’s like spaces. But there are other things
that qualify as whitespace, like newlines or tabs. There’s other characters that
you’ll find in strings, especially if you start
reading them from files, and so they’re sort of crufty bits. The way that I think of whitespace is like here’s something printed out and it’s like abc def. Well, there’s something here, and you can’t see it. It’s like a clear letter.
That’s what whitespace is. If this were a white piece of paper, it would be whitespace. It might be a tab, might be a bunch of spaces, whatever, it’s whitespace. That’s what whitespace means. It affects mostly spaces but there are
a few other characters that do it. So here we have a string that’s got
spaces at the beginning and end. And strip pulls off the characters from
both the beginning and the ends. Whoosh, whoosh. It doesn’t hurt the original variable, it just gives us back a copy with nothing there. And we can strip from the right side if we
want and then we can strip from the left side. And so that’s a way of, you know, otherwise, you’d be writing loops to
get rid of the whitespace like, oh, what if there are four characters or, you know, four characters in the
beginning and I’m going to throw them away? Three characters in, throw them away. I’d be writing a for loop, do this other thing, concatenate
these things together. It’s like you know, why didn’t they
write a library for it? Ah, yes, they did. They did. They wrote a library for it.
And so, we’re in good shape. So, again, these libraries are you’ve got to use them otherwise
you’d be writing crazy loops. We can ask it’s a real common problem
to be scanning through a file and want to know
only the lines that start with a prefix. And there is a built-in method called
startswith, line.startswith(). This takes a parameter, what prefix we’re looking for. And in this case, we get True back
because it does start with Please. Does it start with a lowercase p? And then we get back a False because, no, it doesn’t start with a lowercase p. So, it is a True/False. We tend to use that in the if…
startswith(): do something to the line, and that way we skip a bunch of lines except the ones that start with the
prefix that we’re looking for. Now, let’s put some of this together.
Some find and some slicing. So slicing is the word for using
that : operator. So let’s take a look. So, here is a big, long string. And you’re going to see a lot of these
in the later chapters of the book. We’re obsessed with email messages in this class. And so here is the first line of a
bunch of email messages. The format is the word From, space, then an email address which includes
a name and an @ sign which is the organization, a space, and then a date and time which it was sent. This is actually a real email message
from a real person. That’s Stephen right there. If you’re ever in Cape Town, stop by UCT. He’s there, that’s where he’s at. I’ve been to UCT and I said hi to Stephen. People who’ve taken this course
actually know Stephen. That’s crazy, right? People who have taken this course are
from South Africa and they know Stephen, and they walk up and say, “Hey, Stephen! You’re in Dr. Chuck’s lecture.” Yes, Stephen. You are in Dr. Chuck’s lecture. OK, but that’s not the point. We’re learning Python. OK. So, what I’m interested in is I want to extract
this little bit from here to here. I want to go one character after the @ sign up to, but not including the next space. So, we’re going to take a couple of steps. First, we’re going to say, OK, let’s find the @ sign. Where is that? Python goes and says, oh, that’s in position 21. So it returns 21 back in there, and so we get 21. That’s the start. The character after that
is where we want to start. Now, the next thing I want to do is
I want to say where is the next space after this? Well, it turns out in find you can put up
a second parameter in, and say that’s where to start. So this is starting here and looking
for a space and says, oh, I just found you a space starting at the @ sign. So down comes 31 into here. We have basically 21 to 31, which kind of boundary, that gives us the boundary. And here’s the fun part. We want to slice this out. So slicing is like chop, chop. So we’re going to go one beyond the @. So that’s going to give us the little u at
position plus 1 through space position. But it’s not really space position. It’s up to, but not including the space position, look how nice that came out, right? So up to but not including the space position. We get exactly what we want, not extra stuff, and we get back the piece that
we were trying to pull out. So you’ll see how we sort of put these things together. I must have written this same code 20,000 times
in the last 30 years, of search for something, search for the start of something, search for the end of something, pull the thing out, search for the
start of something, search for the end of something,
pull the thing out. And we’ll find that there are actually
better ways to do this, but this is kind of low-level, sort of doing it the hard way in Python. So this is just a little bit. We’re talking about Python 3, but some of you may have to work
in Python 2 from time to time. And so, one of the real advantages
of Python 3 is that all the strings internally are
what are called Unicode, which means that they can represent
a wide range of character sets. In Python 2, strings sometimes
have to go through conversions. There are two kinds of things. And so in Python 2, there were
regular strings and Unicode strings. And so, you would indicate a Unicode
string by adding this u prefix. And these were different. And sometimes when you read these
from files or wrote those to files, you’d have to kind of go through some
conversion and it was a little bit weird. But the interesting thing is that in Python 3 regular strings
and Unicode strings are all just strings. So, every string inside Python 3 is capable of representing all character sets,
and that’s kind of cool. There will still be some explicit
conversion we’ll have to do. But the conversion in Python 3, when we start talking to databases
and reading data off of networks, there will be conversions we’ll have to do, but those conversions will actually make far more sense than the way
you had to do it in Python 2. So if you took my class in Python 2
and you’re like, “Here, use this buffer thing,” and you say, “Why should I use the buffer thing?” I’m like, “Uh,’cause if you don’t use
the buffer thing, it won’t work.” At least in Python 3, we have a sense of
external data coming from outside your computer. It needs to be dealt with in a certain way
and it’s quite predictable. So Python 3 does a really much better job
on character sets than Python 2. It’s possible in Python 2, but it wasn’t as easy. That’s a quick run through strings. We talked about the types and searching and looping. There’s a lot to it. We still haven’t done anything useful and that’s what we’re going to do
in the next chapter.

Review: Chapter 6


Assignment: Chapter 6


Graded App Item: Assignment 6.5

Code

Video: Worked Exercise: 6.5

Summary of Python Exercise 6.5: Parsing Text Strings

Objective: Extract a floating-point number from a string containing a colon.

Steps:

  1. Identify pattern: Locate the colon using str.find(':').
  2. Extract substring: Grab the portion after the colon (including space) using slicing with ipos + 2 (where ipos is the colon’s position).
  3. Convert to float: Transform the extracted string to a floating-point number using float(piece).

Key Takeaways:

  • String parsing involves identifying patterns and extracting specific parts.
  • Slicing allows precise extraction of substrings based on index positions.
  • Converting strings to other data types like floats requires appropriate functions.
  • This exercise lays the groundwork for more complex data extraction from various sources in later chapters.

Additional Notes:

  • The lecture emphasizes repetition (printing) for debugging purposes during string manipulation.
  • Using relative directory navigation (cd ..) within Python is demonstrated.
  • Future chapters will cover file handling, databases, and web data access for more intricate data extraction tasks.

Hello and welcome to Python for Everybody. My name is Charles Severance, and I’m the author of the book and
the teacher of this class. In this particular session we are going
to do Exercise 6.5 from the textbook. It’s an exercise in parsing text strings. And so the basic idea is we’re
going to see strings of various kinds, and various lengths and we’re going to
want to extract pieces of them, okay? And so the idea is to somehow
get this part out, and then convert it to
a floating point number. This is a proxy for later things,
where we’re actually reading files or reading stuff off the Internet, but
parsing strings is an important thing for us to do. Okay? And so let’s take a look at
a couple of different ways to do this. So let’s go ahead and get started,
let’s go bring up our Atom and I’ve got it open nicely to the right spot
here, and I’m going to make a new folder. Hopefully by now you’re
finding Atom ex 06 05. Atom, or whatever your programmer
editor, is sort of a powerful tool. I’ll close this one. File > New File. A powerful tool that lets you sort
of save a lot of keystrokes, etc, etc. print(‘Exercise 6.5’), just for yucks. And this file, Save As. And again, until I save it,
it’s not going to have the pretty colors. I’m going to save it in 06_05 ex_06_05.py And now it has the pretty colors, and
here I am. Now, I’ve been doing these, and so now I’m actually
already in a directory, so let me show you how to
do relative directory. So I’m in this path right here. And I can use both in Windows and
in Mac and in Linux. I can use the command cd..,
that sort of thinks of the one before. The one that came before. And so now I’m up one directory. And if I do an ls, I will see that
this new ex_06_05 that I just created in this directory from Atom is
already there, cd ex_06_05. In the next chapter,
we will be talking about files. And this is where you really need to
know this concept of folders and files. So ls, and
I’m going to run python3 ex_06_05.py and there we go, exercise 6.5. So we’re sort of in the right spot, we’ve
got this going, and we’ve got this going. Pretty soon we’ll be putting stuff in
the directories that need to be there and you’ll see how all
that’ll work in a second. Well, in the next chapter where
we’ve got to know all this stuff. Okay, so we’ll just grab you,
this first line here. And paste that in. print(str) So. That’s right, and there we go. Actually, there’s supposed
to be a space right there. So I don’t know why this space didn’t get
copied and pasted from my copy and paste. So I’m going to put that space in. There’s supposed to be a space right
there, I think, but we’ll see. So the key thing is if you look at
the lectures from this section, you can like look for things,
and you look for a pattern. And so what I’m going to do
is I am going to look for a pattern that says find me a colon. Okay? And I’m going to say where is there colon equals str.find? I’m going to print out ipos. So I’m going to say,
where in this string is there a colon? That’s going to give me the position and
offset of that. So that says that the colon
is in position 18. Now, it’s not always going to be 18, sometimes these strings will be
a little bit different, okay? So, the next thing I can do is I can say,
a small piece of this string is do str and then starting from that position,
ipos, through the end of that string. And then we’re going to print that out,
print out the piece. And when I’m doing string parsing,
tearing strings apart, I tend to have a lot of situations where
I print over and over and over again. So now let’s see if that
piece is the right piece. And, the answer is,
it doesn’t quite look right, because, see, I’ve got that colon there? And, that’s because it says start at 18,
position 18, wherever that is, and then keep on going. And, so I need to do ipos plus 1
so let’s see. I will just sort of advance past
this little colon character and get into that space, okay? So let’s run it.
So now I got’ve space 0 8 4 7 5 and now I can just see if value equals
float(piece) because piece is a string. It’s a string, and then I’m going to say print(value)
to see if I got the value right. And let’s remember that
there’s a space here. This might mess up float. I don’t think it’s going to mess up float,
because float’s trying to find a floating point number and it kind of,
but let’s just see if it works. Let’s just see if it works. Okay, so
the key is there is it’s in position. The colon is in position 18,
the string we pulled out is blank 0 8 4 7 5, and the floating point number is 0.8475,
so we’ve sort of solved this. Now, I can clean this up a little bit by
making that plus 2, so I’ll just change that to plus 2, and you’ll see
how that changes what I’m doing. And so now this here is the string,
that one there is the string, this is the actual floating point number,
they’re the same thing, other than the fact that it’s a floating point
number, and you can add something to it. So I could do something
like print value plus 42.0, and that would actually work, right? So 42 point, and if did print piece plus 42.0 that will blow up, right? Because piece is a string and 42 is a float and it says can’t convert float object to string implicitly. Okay? And so other than sort of
taking out this extra stuff. I’m just commenting out a whole
bunch of stuff here. Oops. So I take out all those print statements. These five lines are the lines to do
this particular assignment where we are tearing apart a string and in the future
the source this is just so that we can play with strings but later we’ll be
taking this data from over the place. Finally we’re going to start
opening some files and then later in the course we’re going to be
doing opening data from databases, we will be opening data from
the Internet, and do on. There is all kind of sources of data
where we get these strings. But for now, we’re in Chapter 6 and
we’re only focused on strings. So I hope you found this useful and
coming up soon we’ll be opening files.

Bonus: Chapter 6


Video: Bonus: Office Hours New York City

Summary of Dr. Chuck’s Office Hours in NYC:

Location: Largest ever Office Hours on 7th St. and 38th, New York City.

Attendees:

  • Diverse group with various Coursera experience levels (first-timers, veterans).
  • Global representation (Russia, USA).
  • Varied backgrounds (teacher, IT, aspiring programmer).
  • Shared enthusiasm for learning Python and meeting Dr. Chuck.

Highlights:

  • Dr. Chuck welcomes everyone and expresses interest in filming future sessions.
  • Attendees introduce themselves, sharing their motivations for attending.
  • Warm atmosphere with laughter and conversation.
  • Dr. Chuck emphasizes community and collaboration among participants.
  • Announcement of next Office Hours location: Miami Beach, Florida.

Overall:

A successful event bringing together Coursera learners of various backgrounds to connect with Dr. Chuck and fellow Python enthusiasts.

Hello everybody, this is Dr. Chuck. We are in New York City. We are at the largest ever Office Hours
on like 7th St. and 38th, right? >> Hey Doc. >> Oh yeah, you can take a picture. That’s okay, you can take a picture too. Oh, that’s right, I keep thinking I
should have you guys film it yourselves. We needed camera people. But okay, here we go. So we’ll just say hi. >> Hi, I’m Dustin this is my
first Coursera course and I’m here to just learn something new. >> Cool! >> Hi, I’m Carmen and it’s my first Coursera course and
I’m here to learn the program. >> Okay.

Hi, I’m Katie. I’ve taken a couple of Coursera
classes and thrilled to be here. >> Hi, I’m Mark and this is the first
one that I really hope to complete. >> Good, I hope you complete as well. >> Right and I do look forward
to working with Chuck and collaborating with my colleagues. >> Cool.
Hi, I’m Vanesh\g. I’ve lost count of how many
Coursera’s classes I’ve taken, but it’s been fun to
have the instructor here. >> Cool. >> Hi, my name is Anna. Hello from Russia, and thanks Dr.
Chuck for everything. [LAUGH]
[LAUGH] We’ve had many people from
Russia in the course. >> Hi. I’m Ahmed. I will be a future Coursera Student. >> Okay welcome to the class,
Goodnight. >> In the future. >> [LAUGH]
Hi I’m Rodney. This is my first Coursera course. I’m trying to master in Python. >> Okay. >> Hi I’m Shannon it’s also
my first Coursera course and I’m excited to start it
Okay. >> I’m Brian. I just to learn some coding. >> [LAUGH]
Okay Brian, welcome to the class. >> Hi, I’m Alerta and
I’ve been with Coursera for two years. >> Yeah, yeah,
you were in my very first class. >> Yes.
Well, welcome. >> Yes, I am Michael. It’s my first Coursera course. And I hope that this is
going to be a lot of fun. >> But you’re also a teacher right? >> I am a teacher.
What do you teach? >> I teach biology. >> Teach biology, like high school? >> High school, middle school. Yes.
Ok, so you got to find like
the football coach and teach your football coach to teach python. >> Right. >> Of course. >> Hi, I’m Chris and
this is my 14 or 15 MOOC. >> Holy mackerel. [LAUGH] I just wanted to do a meetup
because professor’s so cool. >> Thanks. >> Hi, I’m Sheram. This is my first Coursera course. I work in IT, and, after speaking with all of these
beautiful people here, I’m so excited. Here I come, programming. >> Okay, so thanks everybody for
[NOISE] Okay. Well, we’ll see everybody online. Next week,
we’ll be in Miami Beach, Florida. So, we’ll see you in Miami.

Video: Bonus: Monash Museum of Computing History

  • The instructor believes that understanding the history of a subject can help students gain a better understanding of it.
  • The instructor takes students to a museum to show them early interfaces and discuss the evolution of interfaces over time.
  • The instructor also uses artifacts like punch cards and slide rules to teach students about the early methods of computing.
  • The instructor emphasizes the importance of physical artifacts and hands-on learning in understanding the history of technology.
  • Forgetting about the history of technology can be a danger, as it can lead to a lack of appreciation for the advancements made.

[MUSIC] Good students get a better understanding
of something if they know the history of it. If they know where, if they know
where it’s come from. That’s my, my view. So when I go to class, andI go to the
interface design class, for instance, I can take them down to the museum and show
them what an early interface was. We, we talk about interface design,
and they think of their, their hand-held computer, their
laptop, their desktop. And they look at the screen and they think
that’s we have, that’s an interface, and sure it is but where, you
know, how did we get there? What was the early interface with
computers? And I take them down to the
Ferranti Sirius and they say, well, what’s the interface here
with the computer. So it’s, it’s giving them the idea of what
an interface is by, by taking them back to something that’s that looks very primitive
and, and then we look at the evolution of interfaces
over time. We look at the different applications,
you know, we go back and look at VisiCalc and WordStar and things like that and
sort of progress through the different versions of these
different products and >> You actually have like running
VisiCalc? >> No, no we don’t [CROSSTALK] but I think it just broadens their
outlook on IT and I think it’s a danger with computing
students that they become very, become very focused on the latest,
and quickly discard the technology they’re using when the
next one comes along and forget about it. >> Even before we had the museum we were lucky, we had the CSIRAC, I’ve
mentioned CSIRAC to you. The world’s number four digital stored
program computer was left over, used till about’ 65 So about’ 72 or so it arrived
on campus just for storage purposes. But we put it in a display area. When I first arrived and my teaching
here in about ’88 I began to teach. And I
would always take my introduction to computer architecture
students past it, it had so much to tell students about the origin
of operating systems the primary function of operating systems to allocate resources and move the efficiency of use of resources. >> With the programming students I can
show them things like punch cards and things
like this. You know, how did we get information into the computers and show them things like
that. So it might be part of one class in a
semester, using the museum in particular for my classes. And others use it too in a similar way. But we do have school children coming on
visits and then we have about an hour or two and we take them through the museum
starting with the calculating machines. And we talk about what what did people do before computers, and about
things like this. What is a computer? Where did we get the name computing from? And I’ve got this picture of women using
slide rules in 1948,I think it was, in, this is in America,
doing their calculations. And the idea that the first computers were
people, okay, and, and often they were women, so that’s kind of
an interesting bit of social history there. So trying to connect the computers
and state of the technology to what was happening in
society at the time. >> The physical aspects that are
important today. That idea of physical is still carried
forward even today as we’re looking to see what we can do with the full museum
that we’ve created now. We’ve toyed with ideas of a web-based series
of exhibitions, photographing all of the artifacts, and making sure that people can sort of browse through, and
move through halls. We feel that even with that web-based
teaching methods, we would still like to have physical
artifacts, we’ve toyed with a box of artifacts that we would
send out to schools that are working on our program,
doing an educational program. So, the physical hands-on is a very
important thing. Museum curators understand it. People who aren’t museum curators
tend not to. >> We had a, a staff member in our,
in our school. When we started the museum, I was
talking to her and, and said oh, we will have
slide rules in it. Now she had done an honors
degree in maths. And she had not heard, she not that she hadn’t seen a slide rule, she didn’t even
know what one was and that, to me, was amazing because when I went to uni,
everyone carried a slide rule, you know. We all had [LAUGH] this is my slide rule
from my university. We all had these, as science students,
engineering students, we all had these tools, and she was probably ten years younger than me,
and suddenly had no idea what I was even talking about. Haven’t heard the word. So I think that
really made me realize that history is so quickly
forgotten and it is a danger that we just, yeah, we
forget about these things, and if we don’t have them around for
people to see, they forget about the history of the
technology that they’re using and I think that’s a real danger. [MUSIC]

Video: Fun: The Textbook Authors Meet @PyCon

In this video, Charles Severance and Jeff Elkner meet Allen Downey, the author of Think Python and Think Java. They discuss the licenses used for the books and the number of derivative books that have been created. They also talk about the importance of open licenses and the benefits of Creative Commons. The video ends with a promise to continue the conversation later.

Another camera, and you’re fine. >> I don’t know.

Is it working? >> Is it working? >> It’s working.
Are you selfieing? >> Yeah, this is my selfie cam. My Gimbal selfie cam. >> It works great. >> Yeah. So this way,
I can tape my meeting of Allen Downey. >> Sweet. >> That’s gonna be sweet. >> The moments. >> The moment that I meet Allen. >> That’s how helpful. >> It is. >> Very cool indeed actually. [LAUGH]
Folks, that’s Allen Downey,
who started the whole Think Java, Think Python, revolution back 1999 and
I’m gonna meet him now. For the first time. >> And you can ask him that question, that
I should have asked him a long time ago. Right? >> So what are your thoughts, Jeff? Right before we meet
the great Allen Downey? >> I’m excited, I’m excited, I’m excited.
Okay. >> [LAUGH]
It’s very exciting, but you’ve met Allen before. >> I have, I have.
This is my first time. >> But it’s always exciting. >> Exactly
[LAUGH >> Hi. >> It’s Allen Downey. >> [LAUGH]
I’m Charles Severance. >> Charles, it’s great to see you. >> We’re co-authors. >> Yes.
We’ve never met in person. We talked on the phone. This is the famous Jeff Elkner of course. >> I’m thrilled.
Jeff, it’s nice to meet you. >> Made famous by this guy. >> [LAUGH] Yeah, well we all,
everybody got made famous. >> That’s right. >> So
the first question I was asking Jeff, and he doesn’t even know the answer to. Is why did you choose GFDL back in 1999? That was before creative commons,
before OCW. Before everything. >> Yeah.
What possessed you? >> You know actually the first version of
the book I put it under the Gnu three. >> GFDL? >> No, not. >> GPL
The GPL. >> Oh, you did GPL? >> Because I didn’t even
know about the FTL. >> And
then somebody got a hold of you from MIT. >> And they said, wait a minute,
this is documentation, this is not code. You should have used this other
license and so I switched. And then when Creative Commons came along,
some of those licenses have been useful. >> Yep.
Actually part of the reason that working with O’Reilly has
worked out very well. >> Yeah.
I do things usually under a non commercial
Yeah. >> Creative Commons license. And then that’s kind of a compromise. >> Yep.
It’s maybe not ideal. >> But
those are the details the why questions. >> No the non-commercial’s I think a beautiful thing
Yeah. >> And actually Creative Commons is
thinking about getting rid of it. I’m sad about that, because I think
it’s a fine middle ground, for e-copies can be delivered free and
no one feels bad about that. Print copies you make money off of so
it’s pretty cool. >> I agree. I think there’s a nice huge case for it. >> So have you ever tried to count
the number of derivative books of Think Python? >> No.
Do you think it’s 100, 200, 300? >> I don’t think it’s 100.
How to think like a computer scientist,
Java version. The original. >> If you go all the way back. Yeah, all the way. >> Java’s the original. There are a number of English language
books that are modified versions. >> Including mine. >> There are also- yep.
We need to get that data. >> There are translations
into other languages. >> I count those as separate ones. >> Yep.
But like Runestone like Brad Miller’s stuff. >> But then there’s the interactive one. >> The interactive ones, yes. >> And then his book now. >> Do you have an interactive version? >> No, no, no, I just have the Python for
Informatics print and e-version. >> Yeah, but that’s been out for a while. >> Yeah. >> I did that many years ago. And then I fought with Cambridge Press,
remember that? >> Yes.
Remember those calls? So but why did you give it away? I mean it wasn’t so
cool in 1999 just to give books away. It was not normal. >> True, no. >> That’s the question I had. >> Can we pause? I do wanna, I wanna finish
signing books so these folks are- >> Okay, so we’ll have to talk later. >> Okay, okay, we’ll get back to this. >> Let’s talk more.

Additional materials


Reading: Audio Versions of All Lectures

Reading