Split a string

Split a string in one place by slicing

We will split the following string at character number 3, the leftmost space. See also Common Sequence Operations.

"""
Tell if a Fifth Avenue address is on the east or west side of the avenue.
"""

import sys

address = "350 Fifth Avenue"   #Empire State Building

try:
    i = address.index(" ")
except ValueError as error:
    print(error)   #There was no space.
    sys.exit(1)

number = address[:i]   #the street number
rest   = address[i:]   #the rest of the string

try:
    number = int(number)
except ValueError as error:
    print(error)
    sys.exit(1)

if number % 2 == 0:
    side = "west"
else:
    side = "east"

print(f"{address} is on the {side} side of Fifth Avenue.")
sys.exit(0)

350 Fifth Avenue is on the west side of Fifth Avenue.

Change the if statement to one of the following conditional expressions.

side = "west" if number % 2 == 0 else "east"
side = "east" if number % 2 else "west"

Split a string in many places with split

See Escape Sequences. What happens if the sentence is ""? What happens if the sentence contains no whitespace?

sentence = "The quick  brown\tfox\njumped\rover\r\nthe\flazy dog."
words = sentence.split()   #words is a list of strings.

for word in words:
    print(word)

The
quick
brown
fox
jumped
over
the
lazy
dog.

Split a string with maxsplit and rsplit

address = "350 Fifth Avenue"           #Empire State Building
sections = address.split(maxsplit = 1) #also try rsplit

for section in sections:
    print(section)

350
Fifth Avenue

Split a string into three substrings with partition

Split a string substrings of equal length with textwrap.wrap

The three pairs of hexadecimal digits represent three numbers: the amounts of red, green, and blue to mix together to produce the given color (turquoise).

import textwrap

turquoise = "40E0D0"

#colors is a list of 3 strings, each containing 2 characters.
colors = textwrap.wrap(turquoise, 2)

for color in colors:
    print(color)

40
E0
D0

    print(f"Hexadecimal {color} is decimal {int(color, 16)}".)

Hexadecimal 40 is decimal 64.
Hexadecimal E0 is decimal 224.
Hexadecimal D0 is decimal 208.

Split a string of comma-separated values with the csv module

This example calls next only once because [record] is a list containing only one string.

import csv

record = 'Smith,"John, Jr.","""Johnny""",NY,10003,,212'
reader = csv.reader([record])   #[record] is a list containing one string
fields = next(reader)           #fields is a list containing seven strings

for field in fields:
    print(field)

Smith
John, Jr.
"Johnny"
NY
10003

212

Split a multiline string into lines with splitlines

#A string containing three lines.  Each line ends with a newline character.

lines = """\
John Smith
100 Sunnyside Drive
New York, NY 10010
"""

for line in lines.splitlines():
    print(line)   #line is a string that does not end with a newline character.

John Smith
100 Sunnyside Drive
New York, NY 10010

Raw string

This notation lets us write a regular expression with fewer backslashes.

print(len("\n"))   #A one-character string.  The character is a newline.
print(len("\\n"))  #A two-character string.  The characters are backslash and n.
print(len(r"\n"))  #A two-character string.  The characters are backslash and n.
print(r"\n")

1
2
2
\n

Split a string with a regular expression

See + and parentheses.

import re   #regular expressions

record = "John Smith, 212-234-5678"

m = re.match(r"(\w+) (\w+), (\d{3})-(\d{3})-(\d{4})", record)   #m is a Match.

firstName = m.group(1)   #firstName is a string
lastName = m.group(2)    #lastName is a string
fullName = m.group(1, 2) #fullName is a tuple containing two strings

areaCode = m.group(3)    #areaCode is a string containing three digits
prefix = m.group(4)      #prefix is a string containing three digits
lineNumber = m.group(5)  #lineNumber is a string containing four digits
phoneNumber = m.group(3, 4, 5) #lineNumber is a tuple containing three strings

print(firstName, lastName)
print(fullName)

print(areaCode, prefix, lineNumber)
print(phoneNumber)

John Smith
('John', 'Smith')
212 234 5678
('212', '234', '5678')