Skip to content
Home » Google Career Certificates » Google Cybersecurity Professional Certificate » Automate Cybersecurity Tasks with Python » Module 3: Work with strings and lists

Module 3: Work with strings and lists

You will learn more options for working with strings and lists in Python, and discover methods that can be applied to these data types. You’ll apply this knowledge to write a short algorithm. Finally, you’ll use regular expressions to search for patterns in strings.

Learning Objectives

  • Use Python to work with strings and lists.
  • Write a simple algorithm.
  • Use regular expressions to extract information from text.

Working with strings


Video: Welcome to module 3

This text introduces a section of a security analyst course focused on Python programming for data management. It highlights key topics covered:

  1. Data manipulation: Expanding skills in accessing and processing data stored in strings and lists, specifically extracting elements.
  2. Algorithm development: Learning to write sets of instructions (algorithms) in Python to solve security-related problems.
  3. Regular expressions: Refining text searches using powerful pattern matching techniques.

The speaker expresses excitement about the upcoming lessons and the potential for writing “interesting code”.

Key takeaways:

  • Python skills are valuable for security analysts dealing with data.
  • This section builds on existing knowledge of data types, variables, control flow, and functions.
  • The curriculum includes advanced data manipulation, algorithm development, and regular expressions.

As a security analyst, you’ll work with a lot of data. Being able to develop
solutions for managing this data
is very important. What we’re about to learn in Python will help you with that. Previously, we set foundations for what we’re
going to do in this section. We learned all about data
types and variables. We also covered conditional
and iterative statements. We learned about building functions and even created
our own functions. Here, we’ll build on that
in a few different ways. First, you’ll learn more about working with strings and lists. We’ll expand the ways that you can work with these data types, including extracting
characters from strings or items from lists. Our next focus is on
writing algorithms. You’ll consider a set of
rules that can be applied in Python to solve a
security-related problem. Finally, we’ll further expand
the ways we can search for strings when we explore
using regular expressions. We’re going to have a
lot of fun, and you’ll start writing some really
interesting code in Python. I can’t wait to get started.

Video: String operations

Here’s a summary of the key points about working with strings in Python:

1. Strings in Python:

  • Ordered sequences of characters enclosed in double or single quotes.
  • Can be stored in variables.
  • Example: my_string = "security"

2. Creating Strings from Other Data Types:

  • Use the str() function to convert integers, floats, etc., into strings.
  • Example: new_string = str(123)

3. Basic String Operations:

  • Finding length: Use the len() function to determine the number of characters.
  • Concatenation: Join strings together with the + operator.
  • Methods: Functions specific to strings, applied using a dot after the string.

4. Common String Methods:

  • upper(): Returns a copy of the string in uppercase.
  • lower(): Returns a copy of the string in lowercase.

5. Upcoming Topics:

  • Indexing and splitting strings.

Here’s a tutorial on working with strings in Python:

Welcome to the world of strings!

What are strings?

  • Strings are ordered collections of characters used to represent text.
  • They’re essential for tasks like storing usernames, passwords, messages, and any other text-based data.
  • In Python, strings are enclosed in single or double quotes (it’s your choice).

Creating Strings:

Directly:

my_name = "Alice"
my_message = "Hello, world!"

From other data types:

my_number = 123
my_string_number = str(my_number)  # Now my_string_number is "123"

Basic Operations:

Finding length:

username = "johndoe"
username_length = len(username)  # Stores 8 in username_length

Concatenation (joining strings):

greeting = "Hello" + " " + "world!"  # Results in "Hello world!"

String Methods:

  • Think of methods as special functions that work only with strings.
  • To use a method, put a dot after the string, followed by the method name and parentheses.

Common Methods:

  • upper():
name = "Bard"
uppercase_name = name.upper()  # Stores "BARD" in uppercase_name
  • lower():
message = "HELLO THERE"
lowercase_message = message.lower()  # Stores "hello there" in lowercase_message

Stay tuned for more advanced string techniques!

Knowing how to work with the string data in
security is important. For example, you might
find yourself working with usernames to find patterns
in login information. We’re going to revisit the string data type and learn how to work with it in Python. First, let’s have a quick
refresher on the strings. We defined the string data as data consisting of an ordered
sequence of characters. In Python, strings are written in between
quotation marks. It’s okay to use either double
or single quotation marks, but in this course, we’ve been using double quotation marks. As examples, we have
the strings “Hello”, “123”, and “Number 1!” We also previously
covered variables. Here, the variable my_string is currently storing
the string “security”. You can also create a string
from another data type, such as an integer or a float. To do that, we need to introduce a new built-in function,
the string function. The string function
is a function that converts the input
object into a string. Converting objects to
strings allows us to perform tasks that are
only possible for strings. For example, we might
convert an integer into a string to remove elements
from it or to re-order it. Both are difficult for
an integer data type. Let’s practice converting
an integer to a string. We’ll apply the string
function to the integer 123. Now, the variable new_string contains a string of
three characters: 1, 2, and 3. Let’s print its type to check. We’ll run it. Perfect, it tells us that we now have a string! Awesome!
So far, we know different ways to
create and store a string. Now, let’s explore how to perform some basic
string operations. Our first example is
the length function. The length function
is a function that returns the number of
elements in an object. Using it on a string tells us how many characters
the string has. Earlier in the program, we learned that
IP addresses have two versions, IPv4 or IPv6. IPv4 addresses have a
maximum of 15 characters. So a security
professional might use the length function to check
if an IPv4 address is valid. If its length is greater
than 15 characters, then we’d know that it’s
an invalid IPv4 address. Let’s use this function to print the length of the string “Hello” We’ll nest the length function within the print
function because we want to first calculate
the length of this string and then
print it to the screen. Okay, let’s run this and check out how many characters
Python counts. The output is 5, one for each letter
in the word Hello. We can also use the addition
operator on the strings. This is called string
concatenation. The string concatenation is the process of joining
two strings together. For example, we can add the strings “Hello”
and “world” together. To concatenate strings, we
can use the + symbol. After we run it, we get “Helloworld” with no spaces in between
the two strings. It’s important to note that some operators don’t
work for strings. For example, you cannot use a minus sign to subtract
the two strings. Finally, we’re going to
talk about string methods. A method is a function that belongs to a
specific data type. So, using a string method
on another data type, like an integer,
would cause an error. Unlike other functions, methods
appear after the string. Two common string methods are the upper and the lower methods. The upper method
returns a copy of the string in all
uppercase letters. Let’s apply the upper
method to the string “Hello” We’ll place this inside of a print function to
output it to the screen. Let’s focus on the unique
syntax of methods. After our string “Hello”, we place a period or dot, and then specify the
method we want to use. Here, that’s upper() Okay, now we’re ready to run this. HELLO is printed to the screen
in all uppercase letters. Similarly, the lower
method returns a copy of the string in
all lowercase letters. Let’s apply the lower
method on the “Hello” string. Remember that we need to put
the string and the method inside of a print function
to output the results. And now, we have the string printed in all
lowercase letters. Coming up, we’re going to learn a lot more about strings, like indexing and
splitting strings. I’m looking forward
to meeting you there!

Video: String indices and slices

Here’s a summary of key takeaways about working with strings in Python:

1. Indices:

  • Each character in a string has an index, starting from 0.
  • Access individual characters using square brackets: my_string[1] returns the second character.

2. Slicing:

  • Extract substrings using a range of indices: my_string[1:4] returns characters from index 1 up to (but not including) index 4.

3. Searching with index():

  • Finds the first occurrence of a substring and returns its index: my_string.index("E") returns the index of the first “E”.
  • Case-sensitive.

4. Immutability:

  • Strings cannot be changed after creation.
  • Attempting to modify characters using index notation results in an error.

5. Applications in Security:

  • Locating usernames or IP addresses in logs.
  • Finding specific characters or patterns (e.g., “@” symbol in emails).

Here’s a comprehensive tutorial on working with strings in Python:

Welcome to the world of string manipulation!

Mastering strings is essential for security analysts and anyone working with text data. Let’s dive into key concepts and techniques:

1. Creating Strings:

Directly: Enclose text in single or double quotes:

my_name = "Alice" my_message = "Hello, world!"

From other data types: Use str() to convert to string:

my_number = 123
my_string_number = str(my_number)  # Now my_string_number is "123"

2. Accessing Characters and Substrings:

  • Indexing: Retrieve a single character using its index (starting from 0):
first_char = my_name[0]  # Stores "A"
  • Slicing: Extract a substring using a range of indices:
middle_part = my_name[1:4]  # Stores "lic"

3. Searching for Characters and Substrings:

  • index() method: Returns the index of the first occurrence:
e_index = my_name.index("e")  # Stores 1
  • Case-sensitive!

4. Modifying Strings (Indirectly):

  • Strings are immutable: Cannot directly change characters.
  • Create new strings with modifications:
uppercase_name = my_name.upper()  # Stores "ALICE"

5. Common String Methods:

  • upper(): Converts to uppercase.
  • lower(): Converts to lowercase.
  • find(): Finds a substring (returns -1 if not found).
  • replace(): Replaces a substring with another.
  • split(): Splits a string into a list of substrings.
  • join(): Joins a list of strings into a single string.

6. Applications in Security:

  • Extracting usernames, IP addresses, email addresses, etc., from logs.
  • Searching for specific patterns or keywords.
  • Validating user input for security purposes.

Practice makes perfect! Experiment with these techniques and explore more methods to become a string ninja!

What does the code print("HELLO"[2:4]) output?

“LL”

The code print(“HELLO”[2:4]) outputs “LL”. The first index in the slice is included in the output, but the second index in the slice is not included. This means the slice starts at the character at index 2 and ends one character before index 4.

In security, there are a variety of reasons we might need to
search through a string. For example, we might need to locate a username
in a security log. Or, if we learn that a certain IP address is
associated with malware, we might search for this
address in a network log. And, the first step in being
able to use Python in these ways is learning about the index of
characters in a string. The index is a
number assigned to every element in a sequence
that indicates its position. In this video, we are
discussing strings. So, the index is the position of each
character in a string. Let’s start with
the string “HELLO.” Every character in the
string is assigned an index. In Python, we start
counting indices from 0. So, the character “H”
has an index of 0, and “E” has an index
of 1, and so on. Let’s take this into Python
and practice using indices. Placing an index in
square brackets after a string returns the
character at that index. Let’s place the index 1 in square brackets after
“HELLO” and run it. This returned the character “E.” Remember, indices start at 0, so an index of 1 isn’t the
first character in the word. But what if we want it to return more than
just one character? We can extract a larger part of a string by specifying
a set of indices. This is called a slice. When taking a slice
from a string, we specify where the slice starts and where the slice ends. So we provide two indices. The first index
is the beginning, which is included in the output. The second index is the end, but it’s not included
in the final output. Instead, Python stops the slice at the element
before the second index. For example, if we wanted
to take the letters E-L-L from “HELLO,” we would start the
interval from the index 1, but we’d end before the index 4. Let’s try this example and extract a slice from
a string in Python. Let’s type in the string and
take the slice starting at index 1 and ending
before index 4. Now, let’s run the code
and examine the output. There’s the slice we wanted. Now that we know how to describe the location of a
character in a string, let’s learn how to
search in a string. To do this, we need to
use the index method. The index method finds
the first occurrence of the input in a string and
returns its location. Let’s practice using the
index method in Python. Let’s say we want to
use the index method to find the character
“E” in the string “HELLO.” We’ll locate the
first instance of the character “E.” Let’s examine
this line in more detail. After writing the string
and the index method, we use the character we want to find as the argument
of the index method. Remember, the strings in
Python are case-sensitive, so we need to make sure we use the appropriate case
with the index method. Let’s run this code now. This returned the number 1. This is because “E” has
an index value of 1. Now, let’s explore an example where a character repeats
multiple times in the string. Let’s try searching for the character “L.” We start
with similar code as before, passing the argument “L” instead
of “E” to the index method. Now, let’s run this code
and investigate the result. The result is the index 2. This tells us that
the method only identified the
first occurrence of the character “L” and
not the second. This is an important detail to notice when working
with the index method. As a security analyst, learning how to
work with indices allows you to find certain
parts in a string. For example, if you need to find the location of the @
symbol in an email, you can use the
index method to find what you’re looking for
with one line of code. Now let’s turn our attention to an important property
of the strings. Have you ever heard
the expression “some things never change”? It might be said about the comfortable feeling you
have with a good friend, even when you haven’t seen
them for a long time. Well, in Python, we can also
say this about strings. Strings are immutable. In Python, “immutable”
means that it cannot be changed after it’s created and assigned a value. Let’s break this down
with an example. Let’s assign the string “HELLO”
to the variable my_string. Now, if we want to change
the character “E” to an “A” so my_string has the
value “HALLO,” then we might be inclined
to use index notation. But here we get an error. My_string is immutable,
so we cannot make changes like this.
And there you have it! You’ve just learned how to index
and slice into strings. You’ve also seen that
strings are immutable. You cannot reassign characters after a string has been defined. Coming up, we’ll learn about list operations
and see that lists can be changed with index
notation. Meet you there.

Reading: Strings and the security analyst

Reading

Lab: Activity: Work with strings in Python

Lab: Exemplar: Work with strings in Python

Practice Quiz: Test your knowledge: Work with strings

Which of the following statements correctly describe strings? Select all that apply.

What does the following code return?

What does the following code display?

You want to find the index where the substring “192.168.243.140” starts within the string contained in the variable ip_addresses. Complete the Python code to find and display the starting index. (If you want to undo your changes to the code, you can click the Reset button.)

Work with lists and develop algorithms


Video: List operations in Python

Here’s a summary of the key points about lists in Python:

What are lists?

  • Lists are a data type that store multiple pieces of data in a single variable.
  • They are useful for storing collections of related items, such as IP addresses, application names, or other security-related data.

Creating lists:

  • Use square brackets [] to enclose the items in the list, separated by commas.
  • Assign the list to a variable for easy reference.

Accessing elements:

  • Use index values (starting from 0) within square brackets after the variable name to access specific elements.

Concatenating lists:

  • Use the + operator to combine two lists into a new list.

Key differences from strings:

  • Strings are immutable (cannot be changed after creation), while lists are mutable (can be modified).

Modifying lists:

  • Change elements: Use bracket notation and variable assignment to change existing elements.
  • Insert elements: Use the insert() method to add elements at a specific position.
  • Remove elements: Use the remove() method to delete the first occurrence of a specific element.

Importance in security:

  • Lists are essential for organizing and managing security data effectively.
  • Understanding how to work with lists is a crucial skill for security professionals.

Here’s a tutorial on lists in Python, covering key concepts and operations:

Understanding Lists:

  • Definition: Lists are ordered collections of items, allowing you to store multiple pieces of data under a single variable name.
  • Mutability: Unlike strings, lists are mutable, meaning you can change, add, or remove elements after their creation.
  • Security applications: Lists are frequently used in security tasks to manage IP addresses, blocked applications, log entries, and other security-related data.

Creating Lists:

  • Syntax: [item1, item2, item3, ...]
  • Example: my_list = ["apple", "banana", "cherry"]

Accessing Elements:

  • Indexing: Use square brackets with the index value (starting from 0) to access individual elements.
  • Example: print(my_list[1]) would output “banana”.

Modifying Lists:

  • Changing elements:
    • my_list[index] = new_value
    • Example: my_list[1] = "mango"
  • Inserting elements:
    • my_list.insert(index, element)
    • Example: my_list.insert(1, "orange")
  • Removing elements:
    • my_list.remove(element)
    • Example: my_list.remove("cherry")

Common List Operations:

  • Concatenation: new_list = list1 + list2
  • Length: len(my_list)
  • Membership: item in my_list
  • Indexing: my_list.index(item)
  • Sorting: my_list.sort()
  • Reversing: my_list.reverse()

Iterating Through Lists:

  • For loop:

Python

for item in my_list:
    print(item)
  • While loop:

Python

index = 0
while index < len(my_list):
    print(my_list[index])
    index += 1

Additional Methods for Advanced Operations:

  • append(): Add an element to the end of the list.
  • pop(): Remove and return the element at a specific index (defaulting to the last element).
  • clear(): Remove all elements from the list.
  • count(): Count the occurrences of a specific element.

Remember:

  • Lists are versatile tools for organizing and managing data in Python.
  • Mastering lists is essential for effective security data handling and analysis.
In the list ["elarson", "bmoreno", "tshah", "eraab"], which element has an index of 3?

“eraab”

In the list [“elarson”, “bmoreno”, “tshah”, “eraab”], the element “eraab” has an index of 3. In Python, indices start at 0, so the element that has an index of 3 is the fourth element.

Another data type we discussed
previously is the list. Lists are useful because they allow you to store multiple pieces of
data in a single variable. In the security profession,
you will work with a variety of lists. For example, you may have a list of IP
addresses that have accessed a network, and another list might hold
information on applications that are blocked from running on the system. Let’s recap how to create
a list in Python. In this case, the items in our list
are the letters A through E. We separate them by commas and
surround them with square brackets. We can also assign our list to a variable
to make it easier to use later. Here, we’ve named our variable my_list. When we access specific
elements from lists, we use syntax similar to when we access
the specific elements from strings. We place its index value in brackets
after the variable that stores the list. So this would access
the second item in the list. This is because in Python, we start counting the elements in
the list at zero and not at one. So the index for
the first element is zero and the index for the second element is one. Let’s try extracting some
elements from a list. We’ll extract the second element by putting
1 in brackets after the variable. We place this in a print() function
to output the results, and after we run it,
Python outputs the letter “b”. Similar to strings, we can also
concatenate lists with the plus sign. List concatenation is combining
two lists into one by placing the elements of the second list directly
after the elements of the first list. Let’s work with this in Python.
Let’s concatenate two lists. First, we define the same list
as in the previous example and store it in the variable my_list. Now, let’s define an additional
list with the numbers 1 through 4. Finally, let’s concatenate the two
lists with a plus sign and print out the result. And when we run it,
we have a final concatenated list. Having discussed the similarities, let’s now explore the differences
between lists and strings. We mentioned earlier that
strings are immutable, meaning after they are defined,
they cannot be changed. Lists, on the other hand,
do not have this property, and we can freely change, add, and
remove list values. So, for example, if we have a list
of malicious IP addresses, then every time a new malicious
IP address is identified, we can easily add it to the list. Let’s first try changing a specific
element in a list in Python. We start with the list used in the previous example.
To change an element in a list, We combine what we learned
about bracket notation with what we learned about
variable assignment. Let’s change the second
element in my_list, which is the string “b”,
to the number 7. We place the object we want to
change on the left-hand side of the variable assignment. In this case,
we’ll change the second element in my_list. Then we place an equals sign to indicate we are reassigning this
element of the list. Finally, we place the object to take
its place on the right-hand side. Here, we’ll reassign the second
list element to a value of 7. Now let’s print out the list and
run the code to examine the change. Perfect! The letter “b” is now
changed to the number 7. Now, let’s take a look at methods for
inserting and removing elements in lists. The first method we’ll work with in
this video is the insert method. The insert method adds an element
in a specific position inside a list.
The method takes two arguments: the first is the position we’re
adding the element to, and the second is the element we want to add.
Let’s use the insert method. We’ll start with the list we
defined in our my_list variable. Then we type my_list.insert and
pass in two arguments. The first argument is the position
where we want to insert the new information. In this case,
we want to insert into index 1. The second argument
is the information we want to add to the list; in this case,
the integer 7. Now let’s print my_list.
Our list still begins with “a”, the element with an index of 0, and now, we have the integer
7 in the next position, the position represented
with an index of 1. Notice that the letter “b”, which
was originally at index 1, did not get replaced
like when we used bracket notation. With the insert method, every element beyond index
1 is simply shifted down by one position.
The index of “b” is now 2. Sometimes we might want to remove
an element that is no longer needed from a list. To do this,
we can use the remove method. The removed method removes the first occurrence of
a specific element in the list. Unlike insert, the argument of
removed is not an index value. Instead, you directly type
the element you want to remove. The remove method removes the first
instance of it in the list. Let’s use the remove method to
delete the letter “d” from our list. We’ll type the name of our variable
my_list, then add the remove method. We want to remove “d”from this list. So, we’ll place it in quotation marks as our argument. Then we’ll print my_list. And let’s run this. Perfect!
“d” has now been removed from the list. Just like with strings,
being able to search through lists is a necessary skill for
security analysts. I’m looking forward to
expanding our understanding as we move forward in this course.

Video: Write a simple algorithm

Topic: Algorithms and solving problems in Python with loops, lists, and strings.

Key points:

  • Algorithms are sets of rules that solve problems.
  • They take input, perform tasks, and return a solution as output.
  • This example focuses on extracting the first three digits from a list of IP addresses.
  • The solution involves:
    • String slicing: extracting specific characters from a string.
    • Looping: applying the same steps to each element in a list.
    • Append method: adding elements to the end of a list.
  • We break down the problem into smaller steps before writing code.

Takeaways:

  • Algorithms are fundamental for solving problems in coding.
  • Combining different Python concepts like loops, lists, and strings allows for powerful solutions.
  • Breaking down complex problems into smaller steps helps with efficient code writing.

Here’s a tutorial on algorithms and problem-solving in Python, covering loops, lists, and strings:

Understanding Algorithms:

  • Definition: An algorithm is a set of steps or rules designed to solve a specific problem.
  • Input, Processing, Output: Algorithms take an input, perform actions on that input, and produce an output as a solution.
  • Problem-Solving in Coding: Algorithms are fundamental to writing code that effectively addresses various tasks.

Key Python Concepts:

  • Lists: Ordered collections of items used to store multiple pieces of data in a single variable.
  • Strings: Sequences of characters used to represent text.
  • Loops: Code blocks that repeat a set of instructions multiple times, often used to iterate over items in lists.

Problem Example: Extracting Network Identifiers:

  • Task: Given a list of IP addresses, extract the first three digits of each address, which represent network identifiers.

Steps for Algorithm Design:

  1. Break Down the Problem:
    • Focus on extracting the first three digits from a single IP address.
  2. Solve the Smaller Problem:
    • Use string slicing to extract the desired characters: address[:3]
  3. Apply to the Entire List:
    • Use a for loop to iterate over each IP address in the list.
    • Within the loop, apply string slicing to each address and store the results in a new list.

Python Code Implementation:

Python

ip_addresses = ["192.168.0.1", "10.0.0.2", "172.16.0.3"]  # Sample IP list
network_identifiers = []  # Empty list to store results

for address in ip_addresses:
    network_id = address[:3]  # Extract first three characters
    network_identifiers.append(network_id)  # Add to the result list

print(network_identifiers)  # Output: ['192', '10.', '172']

Explanation:

  • The for loop iterates through each IP address in the ip_addresses list.
  • Inside the loop, string slicing extracts the first three characters using address[:3].
  • The append() method adds the extracted network ID to the network_identifiers list.
  • The final print() statement displays the list of extracted network identifiers.

Key Takeaways:

  • Combine different Python concepts to create powerful solutions for various problems.
  • Break down complex problems into smaller, more manageable steps.
  • Use loops to repeat actions on multiple items in lists.
  • Utilize string slicing to extract specific parts of text data.
  • Practice designing and implementing algorithms to enhance your problem-solving skills in Python.

In our everyday lives, we frequently follow rules
for solving problems. As a simple example, imagine you want
a cup of coffee. If you’ve made
coffee many times, then you likely follow
a process to make it. First, you grab
your favorite mug. Then, you put water into the coffee maker and add
your coffee grounds. You press the start button
and wait a few minutes. Finally, you enjoy your
fresh cup of coffee. Even if you have a
different approach to making coffee or don’t
drink coffee at all, you will likely
follow a set of rules for completing similar
everyday tasks. When you complete
these routine tasks, you’re following an algorithm. An algorithm is a set of
rules that solve a problem. In more detail, an
algorithm is a set of steps that takes an
input from a problem, uses this input
to perform tasks, and returns a solution
as an output. Let’s explore how algorithms can be used to solve
problems in Python. Imagine that you, as
a security analyst, have a list of IP addresses. You want to extract the first three digits
of each IP address, which will tell you
information about the networks that these
IP addresses belong to. To do this, we’re going to
write an algorithm that involves multiple
Python concepts that we’ve covered so far: loops, lists, and strings. Here’s a list with IP addresses that are stored as strings. For privacy reasons,
in our example, we’re not showing the
full IP addresses. Our goal is to extract
the first three numbers of each address and store
them in a new list. Before we write any Python code, let’s break down an approach to solving this problem
with an algorithm. What if you had one IP address
instead of an entire list? Well, then the problem
becomes much simpler. The first step in solving the problem will
be to use string slicing to extract the
first three digits from one IP address. Now let’s consider how to
apply these to an entire list. As the second step, we’ll use a loop to apply that solution to every
IP address on the list. Previously, you learned
about string slicing, so let’s write some Python code to solve the problem
for one IP address. Here we’re starting
with one IP address that begins with 198.567. And we’ll write a few
lines of code to extract the first
three characters. We’ll use the bracket
notation to slice the string. Inside the print statement, we have the address variable, which contains the IP
address we want to slice. Remember that Python
starts counting at 0. To get the first
three characters, we start our slice at index 0 and then continue all
the way until index 3. Remember, that Python
excludes the final index. In other words, Python will return the
characters at index 0, 1, and 2. Now, let’s run this. We get the first three
digits of the address: 198. Now that we’re able to solve this problem for one IP address, we can put this code
into a loop and apply it to all IP addresses
in the original list. Before we do this, let’s introduce one more
method that we’ll be using in this code: the append method. The append method adds
input to the end of a list. For example, let’s say
that my list contains 1, 2, and 3. With this code, we can use the append method to
add 4 to this list. First, we are given the IP list. Now, we’re ready to extract the first three characters from each element in this list. Let’s create an
empty list to store the first three characters
of each IP from the list. Now we can start the for loop. Let’s break this down. The word “for” tells Python that we’re about to
start a for loop. We then choose address as the variable inside
of the for loop, and we specify the list
called IP as the iterable. As the loop runs, each element from
the IP list will be stored temporarily in
the address variable. Inside the for loop, we have a line of code to add the slice from address
to the networks list. Breaking this down, we
use the code we wrote earlier to get the
first three characters of an IP address. We’ll use our append method to add an item to
the end of a list. In this case, we’re adding
to the networks list. Finally, let’s print the
networks list and run the code. The variable networks
now contains a list of the first three digits of each IP address in the
original list: IP. That was a lot of information. Designing algorithms
can be challenging. It’s a good idea to
break them down into smaller problems before jumping
into writing your code. We’ll continue to
practice this idea in the upcoming videos.
Meet you there.

Reading: Lists and the security analyst

Reading

Lab: Activity: Develop an algorithm

Lab: Exemplar: Develop an algorithm

Practice Quiz: Test your knowledge: Work with lists and develop algorithms

Review the following code:

You are working with the list [“cwvQSQ”,”QvPvX5″,”ISyT3a”,”S7vgN0″]. Its elements represent machine IDs, and the list is stored in a variable named machine_ids. Which line of code will add the ID of “yihhLL” at index 3?

Which line of code will remove the username “tshah” from the following list?
access_list = [“elarson”, “bmoreno”, “tshah”, “sgilmore”]

As a security analyst, you are responsible for developing an algorithm that automates removing usernames that match specific criteria from an access list. What Python components would help you implement this? Select three answers.

Regular expressions


Video: Regular expressions in Python

Here’s a summary of the key points from the text:

Regular Expressions (Regex):

  • Sequences of characters that form patterns to search within text.
  • Used for advanced string searching beyond simple methods like indexing and slicing.
  • Useful for finding patterns like specific prefixes, lengths, or structures.

Example: Extracting Email Addresses from a Log

  1. Import the re module: This module provides functions for working with regular expressions in Python.
  2. Define the Regex Pattern:
    • \w+: Matches one or more alphanumeric characters.
    • @: Matches the “@” symbol literally.
    • \.: Matches a period (escaped to avoid its special meaning in regex).
    • Full pattern for email addresses: \w+@\w+\.\w+
  3. Use the re.findall() Function:
    • Takes the regex pattern and the string to search as arguments.
    • Returns a list of all matches to the pattern.

Example Code:

Python

import re

email_log = """... (your log string here) ..."""

emails = re.findall(r"\w+@\w+\.\w+", email_log)
print(emails)  # Output: List of extracted email addresses

Key Points:

  • Regex is a powerful tool for pattern matching in text.
  • It can be used to extract specific information from logs, files, or other text data.
  • The re module in Python provides functions for working with regular expressions.
  • The re.findall() function is used to find all matches to a regex pattern in a string.

Here’s a tutorial on regular expressions (regex) in Python:

Introduction

  • What are regular expressions?
    • Sequences of characters that define search patterns.
    • Used to match, locate, and manipulate text.
    • Powerful tool for text processing and data extraction.
  • Why use regex in Python?
    • Advanced string searching and manipulation beyond basic methods.
    • Extract specific information from website content, logs, files, etc.
    • Validate user input formats (e.g., email addresses, phone numbers).
    • Clean and transform text data for analysis.

Getting Started

  1. Import the re module: Pythonimport re
  2. Basic Concepts:
    • Metacharacters: Special characters with specific meanings in regex.
      • .: Matches any single character except newline.
      • \w: Matches any alphanumeric character (letters, digits, underscore).
      • \d: Matches any digit.
      • \s: Matches any whitespace character.
      • ^: Matches the beginning of a string.
      • $: Matches the end of a string.
      • *: Matches zero or more occurrences of the preceding character.
      • +: Matches one or more occurrences of the preceding character.
      • ?: Matches zero or one occurrence of the preceding character.
      • |: Matches either the pattern before or after the symbol.
      • (...): Groups characters together to create subpatterns.
    • Raw strings: Use r prefix to avoid escaping backslashes in regex patterns.

Common Operations

  • re.search(pattern, string): Finds the first match of the pattern in the string.
  • re.findall(pattern, string): Returns a list of all non-overlapping matches.
  • re.match(pattern, string): Matches only at the beginning of the string.
  • re.sub(pattern, repl, string): Substitutes matches with a replacement string.
  • re.split(pattern, string): Splits the string at occurrences of the pattern.

Example: Extracting Phone Numbers

Python

text = "My phone number is 555-1234. Call me maybe? 555-5678"
phone_numbers = re.findall(r"\d{3}-\d{4}", text)
print(phone_numbers)  # Output: ['555-1234', '555-5678']

Exploring Further

  • Character classes: [abc] matches any of the characters a, b, or c.
  • Quantifiers: {m,n} specifies the minimum and maximum number of repetitions.
  • Lookahead and lookbehind: Assert patterns without including them in the match.
  • Flags: Modify the behavior of regex matching (e.g., case-insensitivity, multiline).

Remember:

  • Regex can be complex, but also powerful and versatile.
  • Practice with different patterns and tools to master regex for effective text processing.
  • Explore online resources and cheat sheets for more patterns and techniques.
Which string matches with the regular expression "b\wa+b"?

“bkaaab”

The string “bkaab” matches with the regular expression “b\wa+b”. The first character must be “b”. After this, the symbol \w is used to match any alphanumeric character, including “k”. Next, the + symbol specifies that there should be one or more occurrences of the character it follows, which in this case is “a”. Finally, the string must end with “b”.

We’ve already learned a lot
about working with strings. This includes working with their
positional indices and slicing them. In the previous video, we applied these to extract the first
three digits from a list of IP addresses. In this video, we’re going to focus on a more advanced
way to search through strings. We’ll learn about searching for patterns
in strings through regular expressions. A regular expression, shortened to regex, is a sequence of characters
that forms a pattern. This pattern can be used when
searching within log files. We can use them to search for
any kind of pattern. For example, we can find all strings
that start with a certain prefix, or we can find all strings
that are a certain length. We can apply this to a security
context in a variety of ways. For example, let’s say we needed to find
all IP addresses with a network ID of 184. Regular expressions would allow us
to efficiently search for this pattern. We’ll examine another example
throughout this video. Let’s say that we want to extract all
the email addresses containing a log. If we try to do this
through the index method, we would need the exact email
addresses we were searching for. As security analysts,
we rarely have that kind of information. But if we use a regular
expression that tells Python how an email address is structured, it would return all the strings that have
the same elements as an email address. Even if we were given a log file
with thousands of lines and entries, we could extract every
email in the file by searching for the structure of an email address
through a regular expression. We wouldn’t need to know
the specific emails to extract them. Let’s explore the regular expression
symbols that we need to do this. To begin, let’s learn about the plus sign. The plus sign is a regular expression
symbol that represents one or more occurrences of a specific character. Let’s explain that through
an example pattern. The regular expression pattern a+ matches a string of any length
in which “a” is repeated. For example, just a single “a”,
three “a’s” in a row, or five “a’s” in a row.
It could even be 1000 “a’s” in a row. We can start working with a quick example
to see which strings this pattern would extract. Let’s start with this
string of device IDs. These are all the instances of the letter
“a” written once or multiple times in a row. The first instance has one “a”,
the second has two “a’s”, the third one has one “a”, and
the fourth has three “a’s”. So, if we told Python to find matches
to the a+ sign regular expression, it would return this list of “a’s”. The other building block we
need is the \w symbol. This matches with any
alphanumeric character, but it doesn’t match symbols. “1”, “k”, and “i” are just three examples
of what “\w” matches. Regular expressions can easily
be combined to allow for even more patterns in a search. Before we apply this to our email context,
let’s explore the patterns we can search for
if we combine the “\w” with the plus sign. “\w” matches any alphanumeric character, and the plus sign matches any number of
occurrences of the character before it. This means that
the combination of “\w+” matches an alphanumeric
string of any length. “\w” provides flexibility in the
alphanumeric characters that this regular expression matches, and the plus sign provides flexibility in
the length of the string that it matches. The strings “192”, “abc123”, and
“security” are just three possible strings
that match to “\w+”. Now let’s apply these to extracting
email addresses from a log. Email addresses consist of text
separated by certain symbols, like the @ symbol and the period. Let’s learn how we can represent
this as a regular expression. To start, let’s think about the format
of a typical email address; for example, user1@email1.com. The first segment of an email address
contains alphanumeric characters, and the number of alphanumeric
characters may vary in length. We can use our regular
expression “\w+” for this portion to match to
an alphanumeric string of any length. The next segment in an email
address is the @ symbol. This segment is always present. We’ll enter this directly in our regular
expression. Including this is essential for ensuring that Python distinguishes
email addresses from other strings. After the @ symbol is the domain name. Just like the first segment, this one
varies depending on the email address, but it always contains
alphanumeric characters, so we can use “\w+”
again to allow for this variation. Next, just like the @ symbol, a period is always part of an email
address. But unlike the @ symbol, in regular expressions,
the period has a special meaning. For this reason,
we need to use backslash period here. When we add a backslash in front of it, we let Python know that we are not
intending to use it as an operator, and that our pattern should
include a period in this location. For the last segment,
we can also use “\w+”. This final part of an email
address is often “com” but might be other strings like “net.” When we put the pieces together, we get the regular expression we’ll use
to find email addresses in our row. This pattern will match
all email addresses. It will exclude everything
else in our string. This is because we’ve
included the @ symbol and the period where they appear in
the structure of an email address. Let’s bring this into Python. We’ll use regular expressions to
extract email addresses from a string. Regular expressions can be used when
the re module is imported into Python, so we begin with that step. Later, we’ll learn how to import and
open files like logs. But for now, we’ve restored our log
as a string variable named email_log. Because this is a multi-line string, we’re using three sets of quotation
marks instead of just one. Next, we’ll apply the findall() function
from the re module to a regular expression. re.findall() returns a list
of matches to a regular expression. Let’s use this with the regular expression
we created earlier for email addresses. The first argument is the pattern
that we want to match. Notice that we place
it in quotation marks. The second argument indicates
where to search for the pattern. In this case, we’re searching through
the string contained within the email log variable. When we run this, we get a list
of all the emails in the string. Imagine applying this to a log with
thousands of entries. Pretty useful, right? This was just an introduction to
the power of regular expressions. There are many more symbols you can use. I encourage you to explore regular
expressions on your own and learn more.

Reading: More about regular expressions

Reading

Lab: Activity: Exemplar: Use regular expressions to find patterns

Practice Quiz: Test your knowledge: Regular expressions

Which regular expression symbol represents one or more occurrences of a specific character?

As a security analyst, you are responsible for finding employee IDs that end with the character and number sequence “a6v”. Given that employee IDs consist of both numbers and alphabetic characters and are at least four characters long, which regular expression pattern would you use?

You have imported the re module into Python with the code import re. You want to use the findall() function to search through a string. Which function call enables you to search through the string contained in the variable text in order to return all matches to a regular expression stored in the variable pattern?

Which of the following strings would Python return as matches to the regular expression pattern “\w+”? Select all that apply.

Review: Work with strings and lists


Video: Wrap-up

Congratulations, Security Sleuths! ️

In this whirlwind of a module, we conquered some key concepts:

  • String & List Power: We wielded methods to manipulate these crucial data types, extracting valuable information with precise control.
  • Algorithmic Adventure: We crafted a nifty algorithm to slice network IDs from IP address lists, flexing our programming muscles.
  • Pattern Prowess: We unlocked the mysteries of regular expressions, learning to search for hidden patterns like seasoned data detectives.

These complex tools now sit in your security toolbox, making you a more skilled data wrangler and algorithm architect. Remember, you can always revisit the videos to sharpen your skills.

This is just the beginning of your Python journey in the realm of security analysis. Buckle up for more practice and unleash the full potential of Python for your future crime-fighting endeavors!

This summary:

  • Celebrates the accomplishments of the module.
  • Briefly recaps the covered concepts with engaging metaphors.
  • Encourages revisiting the material and staying motivated.
  • Teases upcoming exciting challenges and learning opportunities.

Congratulations! We
accomplished a lot together. Let’s take time to
quickly go through all the new concepts we covered. We started this
course by focusing on working with strings and lists. We learned methods that work specifically with
these data types. We also learned to work with indices and extract
information we need. Next, we focused on
writing algorithms. We wrote a simple
algorithm that sliced the network ID from a
list of IP addresses. Finally, we covered using
regular expressions. Regular expressions allow
you to search for patterns, and this provides
expanded ways to locate what you need in
logs and other files. These are complex concepts,
and you’re always welcome to visit the videos
again whenever you like. With these concepts, you took a big step towards being able to work with data and write the algorithms that security
professionals need. Throughout the rest
of this course, you’re going to get
more practice with Python and what it can
offer to security analysts.

Reading: Reference guide: Python concepts from module 3

Reading: Glossary terms from module 3

Terms and definitions from Course 7, Module 3

Quiz: Module 3 challenge

What is the output of the following code?
print(len(“125”))

Which line of code returns a copy of the string “bmoreno” as “BMORENO”?

What is the index of the character “c” in the string “encryption”?

You need to take a slice from an employee ID. Specifically, you must extract the characters with indices of 3, 4, 5, and 6. Complete the Python code to take this slice and display it. (If you want to undo your changes to the code, you can click the Reset button.)

Which code joins a list of new_users to a list of approved_users and assigns the value to a third variable named users?

A variable named my_list contains the list [1,2,3,4]. Which line of code removes the last element in the list?

Fill in the blank: Determining that you need to use string slicing and a for loop to extract information from items in a list is part of creating a(n) _____.

Which of the following strings would Python return as matches to the regular expression of “\w+”? Select all that apply.

You have imported the re module into Python with the code import re. Which code searches the device_ids string variable for a pattern of “r15\w+”?

What does the code username_list.append(“bmoreno”) method do?