Python – Strings

This post is the eighth of many that attempts to document everything I have been learning about Python.

Strings are ordered collections of single characters. They are part of the family of sequences. Python does not have a type that is used to represent single characters.

You create a string by enclosing a piece of data in either single or double quotes.

You can run built-in sequence operations on strings.

>>> cartmans_song = ‘The lazy river has never been lazier
>>> # Built-in sequence operations
>>> len(cartmans_song)
36

Strings can have escape characters embedded in them. You start a special character using a \ backslash. Internally, these are just replaced with their ASCII values.

>>> # \n is a new line escape character
>>> str_with_escape_char = ‘the lazy river \nhas never been lazier
>>> print(str_with_escape_char)
the lazy river
has never been lazier
>>>

Raw strings are useful when you don’t want to interpret a backslash as start of a special character. To indicate that a string is a raw string, add a r or R before the opening quote. A raw string cannot end in a single backslash.

>> raw_string = r’This is a raw string. Escape chars will be \n igonred
>>> print(raw_string)
This is a raw string. Escape chars will be \n igonred
>>>

Triple quotes are used when you want to represent multiline strings. You can use either single or double quotes. These strings are displayed as-is, which means that if you have any line breaks in the string, it will be displayed with those line breaks.

>>>
>>> triple_quoted_string = “””Eric Cartman
Stan Marsh
Kyle Broflovski
Kenny McCormik“””
>>>
>>> triple_quoted_string
Eric Cartman\nStan Marsh\nKyle Broflovski\nKenny McCormik
>>>
>>> print(triple_quoted_string)
Eric Cartman
Stan Marsh
Kyle Broflovski
Kenny McCormik
>>>

Python will always display a non printable character using the hex notation.

Basic String Operations

len() to find the length of the string.

+ for concatenation, * for repetition, these operators are overloaded for the string type.

in can be used to check for membership, it is like the find() string method, but instead of returning the offset it returns a bool value.

>>> cartmans_song = ‘the lazy river has never been lazier
>>>
>>> # use the in memebership operator to search for a string within another string
>>> ‘lazyin cartmans_song
True
>>>
>>> cartmans_song = ‘The lazy river has never been lazier
>>> # gets the length of the string
>>> len(cartmans_song)
36
>>> # concatenate
>>> ‘New England‘ + ‘ Patriots
‘New England Patriots’
>>>
>>> # repeat
>>> ‘But maa ‘ * 4
‘But maa But maa But maa But maa ‘
>>>

Indexing

You can retrieve specific characters from a string by specifying the index. The offset starts at 0. You can also specify a negative index to specify the offsets from right. The rightmost offset is -1.

>>> cartmans_song = ‘The lazy river has never been lazier
>>> # gets the character at the first offset
>>> cartmans_song[0]
‘T’
>>> # gets the character at the last offset
>>> cartmans_song[-1]
‘r’
>>>

Slicing

You can parse out a set of characters from a string using slicing.

S[i:j] – start at the offset i and get all characters up to but not including j. You can also choose to omit the lower and upper bound, in which case they will switch to using defaults which is 0 for lower bound and the length of the string for the upper bound.

>>> cartmans_song = ‘The lazy river has never been lazier
>>> cartmans_song[4:30]
‘lazy river has never been ‘
>>> cartmans_song[:30]
‘The lazy river has never been ‘
>>> cartmans_song[4:]
‘lazy river has never been lazier’
>>>

If you slice without specifying either bounds, you are effectively creating another object in memory, it will have the same value as the string you are slicing.

>>># this just creates a new string
>>> new_str = cartmans_song[:]
>>> new_str
‘The lazy river has never been lazier’
>>>

Slicing using 3 indexes

S[i:j:k] here k is called the step, so you start from i through j by stepping through k number of characters. If you omit both i and j and use -1 as k, you are effectively reversing a string, also if you do specify i, j and a k with a negative offset the lower(i) and upper(j) bounds get reversed.

>> cartmans_song = ‘The lazy river has never been lazier
>>> # start from the beginning and print every 4th character
>>> cartmans_song[0:36:4]
‘Tl eae nz’
>>> # negative step index, reverses the bounds.
>>> cartmans_song[36:0:-1]
‘reizal neeb reven sah revir yzal eh’
>>> cartmans_song[::-1]
‘reizal neeb reven sah revir yzal ehT’
>>>
You can use a slice object in the index to get a slice:
>>> cartmans_song[slice(1,4)]
‘he ‘
>>> cartmans_song[slice(None, None)]
‘The lazy river has never been lazier’
>>>

Conversions

To convert something to a string you can pass it to the str() method.

You can convert a character to its ASCII value by passing it to the ord() method.

You can convert a number to its equivalent character by passing it to the chr() method

You can use string formatting expressions to generate strings.

>>>
>>> # Create a new string
>>> new_string = str(“this is a new string“)
>>>
>>> # get the ASCII value of a character
>>> ord(‘a‘)
97
>>>
>>> # get the character by passing in a ASCII value
>>> chr(97)
‘a’
>>>
>>> # string formatting expression.
>>> ‘{0} is a string‘.format(‘this‘)
‘this is a string’
>>>

String methods

Built-in operators and expressions can be used on a variety of types, string methods can only be used on the string type.

Some string methods:

find() – returns the offset where it finds the string.

replace() – replaces all occurrences of the pattern with the passed in second argument, if you pass in a third arugment, it will replace it only for those many number of occurences.

join() – takes in any iterable as a parameter and joins the values in that iterable by using the string it is being called on as a delimiter.

split() will split a string into a list based on a delimiter which is passed in.

>>>
>>> # Create a new string
>>> new_string = str(“this is a new string“)
>>>
>>> # get the ASCII value of a character
>>> ord(‘a‘)
97
>>>
>>> # get the character by passing in a ASCII value
>>> chr(97)
‘a’
>>>
>>> # string formatting expression.
>>> ‘{0} is a string‘.format(‘this‘)
‘this is a string’
>>>
>>> # find the offset in the string that matches the argument.
>>> new_string.find(‘is‘)
2
>>> # replace one value with another.
>>> new_string.replace(‘string‘, ‘simple string‘)
‘this is a new simple string’
>>>
>>> # join a iterable using the string on which the join operation is being called on.
>>> ‘.‘.join(‘string‘)
‘s.t.r.i.n.g’
>>>
>>> # upper case
>>> new_string.upper()
‘THIS IS A NEW STRING’
>>>
>>> # check if all the values are alphabets
>>> new_string.isalpha()
False
>>> cartmans_song = ‘the lazy river has never been lazier
>>>
>>> # split the string using a empty space as the delimiter
>>> cartmans_song.split(‘ ‘)
[‘the’, ‘lazy’, ‘river’, ‘has’, ‘never’, ‘been’, ‘lazier’]
>>>

String module in older 2.x had most of these string methods, it has been removed in 3.0. You should always use string methods not the methods in the old string module. The string module itself has been retained because it contains some other advanced string processing tools.

String formatting expressions

You can use string formatting expressions to create strings. You can use formatting characters as place holders and use a single value to the right hand of the % or use a tuple in case of multiple values.

You can also pull up values from a dictionary by using:

>>> ‘%(first)s %(second)d’ % {‘first‘ : ‘eric’, ‘second‘ : 5}
‘eric 5’
>>>

The built in function vars(), generates a dictionary of all the variables and values that have been loaded in the current thread, you can use this on the right hand side of the % to pull values for formatting strings using a dictionary.

String formatting methods

You can use the format method in Python to create strings. This is new in 2.6 and 3.x. You have to use either positional or named arguments. You can also use object attributes.

>>>
>>> # positional string formatting.
>>> “{0} and {1}“.format(“Eric“, “Stan“)
‘Eric and Stan’
>>>
>>> # named string formatting.
>>>
>>> “{0[eric]}“.format({“eric” : “cartman“})
‘cartman’
>>>
>>> # string formatting using object attributes
>>> import sys
>>> “{0.platform}“.format(sys)
‘win32’
>>>
>>> # variant of string formatting using object attributes
>>> “{names[SouthPark]}, {sys.platform}“.format(names = {“SouthPark” : “Eric Cartman”}, sys = sys)
‘Eric Cartman, win32’
>>>

You can use list indexes or dictionary names as the values in the squared brackets

You cannot use negative index offsets in the string literal that is being used as a template.

You can use the formatting method to specify left / right justification, alignment and precision.

The format built in function calls the types internal format method.

You can in 3.1 and later skip the positional index all together, and Python will use relative indexing.

You can in 3.1 and later, use a comma to print out numbers with comma sperating thousands

>>> nu = 9999
>>> test = “{0:,d}”.format(nu)
>>> test
‘9,999’
>>>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s