# 09: Lists and Tuples

## Learning Outcomes

- Declare and use both
`list`

and`tuple`

- Differences between
`list`

and`tuple`

- Some
`list`

operations - Mutable vs. immutable types
- numpy arrays
- vector operations on numpy arrays

### Contents

- This will become a table of contents (this text will be scraped).

`tuple`

types

A `tuple`

consists of a number of values separated by commas, for instance:

1
2
3

>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345

1
2

>>> t
(12345, 54321, 'hello!')

They may be nested

1
2
3

>>> u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))

As you see, on output tuples are always enclosed in parentheses, so that nested tuples are interpreted correctly; they may be input with or without surrounding parentheses

**Note** When you create a tuple of one item you must always include a trailing comma like

1
2

>>> (1,)
(1,)

else it doesn’t create a tuple

1
2

>>> (1)
1

empty tuples can be created without a comma

1
2

>>> ()
()

this is also the same with an empty list

## Indexing

Access items by referring to the index number (starting from `0`

)

1
2
3

>>> t = 12345, 54321, 'hello!', 'goodbye!'
>>> t[1]
54321

Negative indices start from the back and in the reverse

1
2

>>> t[-1], t[-2], t[-3], t[-len(t)]
('goodbye!', 'hello!', 54321, 12345)

**Note** `t[-0] == t[0]`

so the reverse indexing actually starts at 1

Ranges of indices can be specified by the function `slice`

which specifies the slice **from and including** the start index **to but not including** the end index

1
2
3

>>> a_tuple = 0, 1, 2, 3, 4, 5, 6, 7,
>>> a_tuple[slice(0, 2)]
(0, 1)

An optional thirs argument (the step) can also be specified like

1
2

>>> a_tuple[slice(0, None, 2)]
(0, 2, 4, 6)

Here we gave a step of `2`

so it starts at the 0th item and returns every 2nd index until the end.

In the `slice`

object providing `None`

gives the default behaviour which is to start at 0 and slice to the end with a step of 1… This is given by

1
2

>>> a_tuple[slice(None, None, None)]
(0, 1, 2, 3, 4, 5, 6, 7)

In reality no-one really uses `slice`

as there is a shorthand for it: The colon `:`

!

When used between square brackets as an indexing operation is effectively separates what would otherwise be a call to the `slice1`

operator for example

1
2

>>> a_tuple[3:]
(3, 4, 5, 6, 7)

remember slices are up until (NOT INCLUSIVE) of the end index

1
2

>>> a_tuple[-4:-1]
(4, 5, 6)

this is the same as `slice(0, 5, 2)`

1
2

>>> a_tuple[0:5:2]
(0, 2, 4)

this is the same as `slice(1, None, 2)`

1
2

>>> a_tuple[1::2]
(1, 3, 5, 7)

1
2

>>> a_tuple[::-2] # same as slice(None, None, -2)
(7, 5, 3, 1)

### A side note on `range`

We have used `range`

a few times. It is worth pointing out that its arguments operate in exactly the same way as `slice`

. Except, instead of returning an index slice, it returns actual index values as a special `range`

object.

1
2
3
4

>>> a = range(5)
>>> for i in a:
... print(i, end=', ')
0, 1, 2, 3, 4,

1
2

>>> a
range(0, 5)

This can be converted to a `list`

or `tuple`

if we wish to use it as such

1
2

>>> list(range(5)), tuple(range(6, 2, -1))
([0, 1, 2, 3, 4], (6, 5, 4, 3))

`list`

types

A list is a collection which is ordered and changeable. In Python lists are written with square brackets.

1
2
3

>>> this_list = ["apple", "banana", "cherry"]
>>> this_list
['apple', 'banana', 'cherry']

Lists are indexed in eactly the same ways as tuples

1
2

>>> this_list[1::-1]
['banana', 'apple']

We have already encountered `.append`

. There are also other additional things with lists as we have extra methods available, some are shown below

1
2
3

>>> fruits = ['pear', 'banana', 'kiwi', 'apple', 'banana', 'grape']
>>> fruits.index('banana', 4) # Find next banana starting a position 4
4

if no argument is provided it will pop `-1`

1
2

>>> fruits.pop(0)
'pear'

pop will also remove that item from the list

1
2

>>> fruits
['banana', 'kiwi', 'apple', 'banana', 'grape']

Both `list`

and `tuple`

types can be multiplied

1
2

>>> fruits[:2] * 2
['banana', 'kiwi', 'banana', 'kiwi']

1
2

>>> [tuple(fruits[:2])] * 2
[('banana', 'kiwi'), ('banana', 'kiwi')]

For a more complete list google something like “list methods python”

### Example: Fibonacci Series

Here we use use the slice notation and python’s builtin `sum`

function to simplify

1
2
3
4
5

>>> result = [0, 1]
>>> for i in range(5):
... result.append(sum(result[-2:]))
>>> result
[0, 1, 1, 2, 3, 5, 8]

#### Example: A common misunderstood feature with the python `list`

1
2
3
4
5

>>> a_list = list(range(3))
>>> b_list = [a_list] * 3
>>> a_list.append('test')
>>> b_list
[[0, 1, 2, 'test'], [0, 1, 2, 'test'], [0, 1, 2, 'test']]

Whereas with tuples…

1
2
3
4
5

>>> a_tuple = tuple(range(3))
>>> b_tuple = (a_tuple,) * 3
>>> a_tuple += ('test',)
>>> b_tuple
((0, 1, 2), (0, 1, 2), (0, 1, 2))

## Mutable vs. Immutable object types

Though tuples may seem similar to lists, they are often used in different situations and for different purposes

- Immutable objects can’t be changed.
- Mutable objects can be changed.

***Immuatable object types like tuples cannot be mutated by assignment**

1
2
3
4
5
6
7
8
9
10
11
12
13

>>> example_tuple = 1, 2, 3, 4
>>> example_tuple[2] = 5
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-3faa60cab3a6> in <module>()
1 example_tuple = 1, 2, 3, 4
----> 2 example_tuple[2] = 5
TypeError: 'tuple' object does not support item assignment

***Mutable object types like lists can be mutated by assignment**

1
2
3
4

>>> example_list = [1, 2, 3, 4]
>>> example_list[2] = 5
>>> example_list
[1, 2, 5, 4]

### A brief foray into `c`

Python was concieved as a scriptiong language built on the more low level language `c`

. In `c`

, programmers must be more-so aware that variables are simply link to memory address locations (Ever heard of RAM?). The random access memory (RAM) is where all these variables are stored. Conceptually imagine it as a big list with numbers assigned to it

We can query what the address location of each python object is by doing

1
2
3

>>> id(0) # always put comments at least 1 space away from the last code piece
>>> id([0]) # unless visually making a block like this
4532216456

From this we can see that the object `0`

has an address location and the object `[0]`

has another different address location

What a mutable type **really** means is that we can change the address location of the type’s contents

1
2
3

>>> example_list = list(range(5)) # remember that a range(n) give an iterable from 0 to n-1
>>> id(example_list), example_list
(4534667592, [0, 1, 2, 3, 4])

***whilst keeping the memory address of the container the same**

1
2
3

>>> example_list += [5] # note that you can put any math operator infront of =
>>> id(example_list), example_list # to do an inplace assignment! e.g. -=, /=, %= etc
(4534667592, [0, 1, 2, 3, 4, 5])

Whereas with the `tuple`

1
2
3

>>> example_tuple = tuple(range(5))
>>> id(example_tuple), id(example_tuple)
(4532001432, 4532001432)

***the address changes when it is mutated**

1
2
3

>>> example_tuple += (5,)
>>> id(example_tuple), example_tuple
(4531868968, (0, 1, 2, 3, 4, 5))

### Important note about address locations

If the variable is assigned to another variable, the variable *points to* the same address location which can be seen by

1
2
3
4

>>> example_tuple = tuple(range(5))
>>> another_variable = example_tuple
>>> id(example_tuple), id(another_variable)
(4532001784, 4532001784)

However, if we then change the address location of the original variable **we break the link!**

1
2
3

>>> example_tuple += (5,)
>>> id(example_tuple), id(another_variable)
(4531868872, 4532001784)

1
2

>>> example_tuple, another_variable
((0, 1, 2, 3, 4, 5), (0, 1, 2, 3, 4))

Whereas with mutable types like lists this reference persists!

1
2
3
4
5

>>> example_list = list(range(5))
>>> another_variable = example_list
>>> example_list += (5,)
>>> id(example_list) == id(another_variable)
True

A better way of checking the variables are exactly the same **object value** is by using `is`

1
2

>>> example_list is another_variable
True

A slice, will create a copy of a `list`

object to a new address location. A blank slice will also do this

1
2

>>> example_list[:] is another_variable
False

For a more detailed overview of how exactly this works check out https://realpython.com/pointers-in-python/#immutable-vs-mutable-objects

## Numpy Arrays

Numpy arrays are **exceeedingly important** to your future career as a python expert!
They offer a way of doing optimised mathematics on lists.

### The need for `numpy`

arrays

Take the following as an example:

1

>>> arr = list(range(3))

Note that we can’t simply multiply a normal list and expect its elements to be doubled! The example below is expected behavour.

1
2

>>> arr * 2
[[1, 2, 3], [1, 2, 3]]

Think of a list like an object e.g. like an apple. If you multiply 10xApples you don’t suddenly expect 10x the number of seeds and one apple. As with the 10 apples you should expect 10 lists.

Thus we are required to do something like

1
2
3

>>> arr2 = arr.copy() # As a teaser, try without .copy() and see what happens to arr!
>>> for i in range(arr.shape[0]):
... arr2[i] = arr[i] * 2

### The `numpy`

solution

Compare to the numpy solution

1
2
3
4

>>> import numpy as np
>>> arr = np.array([1, 2, 3])
>>> np.array(arr) * 2
array([0, 2, 4])

In fact we can use

`np.arange`

to further simplify

1 >>> np.arange(3) * 2

Not only is this more concise, but `numpy`

uses optimised C++ libraries that are incredibly difficult for humans to beat in terms of speed. Thus all operations we can push to the C++ part of `numpy`

are as fast if not faster than some seriously intense C++ code (As a reminder C++ is a very fast language and what most core quant pricers are written in as a result)

As an example you may have a look at this `numpy`

module which does matrix multiplication (note `matmul`

is actually `arr * arr.T`

not `arr * 2`

) this will call a library called BLAS if possible which is a **highly optimised** linear algebra module written in Fortran and C++. In a nutshell: Goodluck at writing faster code than the authors of BLAS routines.

### Iteration and indexing in `numpy`

Iteration and indexing in numpy are somewhat funky and this can make newwer users veryu confused between `list`

and `numpy.array`

iteration… I will try to not confuse you by making explicit what is specifically only allowed in numpy and what is allowed in `list`

objects as well

All normal list indexing tricks can be used on `numpy`

arrays

1
2
3
4
5
6

>>> import numpy as np
>>> arr = np.arange(10)
>>> arr[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>> arr[:9]
array([0, 1, 2, 3, 4, 5, 6, 7, 8])

#### Masking

However **only in numpy arrays** we can use an array itself to

**mask**or select items

1
2

>>> arr[arr > 5]
array([6, 7, 8, 9])

This will not work in a normal python list!

1
2
3
4
5
6
7
8
9

>>> l = list(range(10))
>>> l[l > 5]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-33f477ef3cbe> in <module>()
1 l = list(range(10))
----> 2 l[l > 5]
TypeError: '>' not supported between instances of 'list' and 'int'

This works because solely because

1
2

>>> arr > 5
array([False, False, False, False, False, False, True, True, True, True])

as a comparison see what happens when we do

1
2

>>> arr[[True] * 3 + [False] * 7]
array([0, 1, 2])

similarly we could do (remember the bitwise logic?)

1
2

>>> arr[(arr == 2) | (arr == 5)]
array([2, 5])

#### Fancy indexing

Another thing we can do in `numpy`

is index explicitly with another array

1
2
3
4

>>> arr = np.arange(10)**2
>>> idx = np.arange(10) * 3
>>> arr[idx[idx < 10]] # take this statement apart to understand it!
array([ 0, 9, 36, 81])

Whereas a similar statement won’t work with lists

1
2
3
4
5
6
7
8
9

>>> l = list(range(10))
>>> l[[0, 2, 4]]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-5efbb062c778> in <module>()
1 l = list(range(10))
----> 2 l[[0, 2, 4]]
TypeError: list indices must be integers or slices, not list

Confusingly, we can actually use a raw `list`

to index a numpy array in fact we can use both `list`

and `tuple`

1
2
3
4

>>> arr = np.arange(10)**2
>>> l = [3, 7, 9]
>>> arr[l]
array([ 9, 49, 81])

### Optimising code for `numpy`

The key takeaway is to push all our compute into

`numpy`

’s C++ core

How do we achieve this in practice?

It can be tricky but in short use `numpy`

arrays where possible for any math that needs to be done on every element this includes `**`

`+`

`/`

`-`

`*`

`%`

`//`

`^`

`|`

`&`

as examples.

1

>>> np.arange(100)

## Exercises

### Exercise 9.1: Nested loops

This exercise is to help you understand iteration across more than one dimension.

Find the trace of the matrix `a`

. The trace is defined as the sum of all the diagonal elements (this is 14 from visual inspection).

1

a = np.array([[1, 3, 5],[1, 4, 6],[7, 6, 9]])

1
2
3
4

>>> a
array([[1, 3, 5],
[1, 4, 6],
[7, 6, 9]])

**Hint** You may also want to play with

1
2

>>> for i, ai in enumerate(a):
... print(i, ai)

recalling how multiple variable assignment works. Google may be your friend here!

**Hint** For the less maths-inclined the diagonal is `[1, 4, 9]`

. We don’t want the anti-diagonal of `[5, 4, 7]`

.

1

```
# Solve me!
```

### Exercise 9.2: Monte Carlo modelling using Geometric Brownian Motion

This exercise is to expand on the previous example with a bit of quantitative finance (Don’t worry though the maths will all be solved out for you!).

Here you will be guided through a monte carlo stock price model using Geometric Brownian Motion. You will also show that the returns from a GBM model follow a normal distribution.

This exercise is intended to be partcularly challenging and may require some googling but don’t give up - 99% of coding is being frustrated and feeling stupid… I sort of didn’t tell you that when you signed up haha!

#### Exercise 9.2.1: Storing iteration results in a list

Modify your previous example to the brownian motion exercise so that you store the results in a list named `path`

while the stock price is greater than 0. Set `path_len = 504`

(i.e. 2 years) and only run for this many time steps.

Use the same values for `dt = 1/252`

, `sigma`

, `mean`

etc that you used before, some (not all) are shown below

1
2
3
4
5
6
7
8

import numpy as np
np.random.seed(42)
sigma = .6729 # annualised volatility of 67.29%
dt = 1/252 # using annualised vol so need days as frac of yr
r = 0.02 # annualised expected return
path_len = 504 # 2 years of 252 business days
# solve me

as a check you should get the following

1
2
3
4
5
6

>>> path[:5]
[100,
102.04421752046065,
101.36484570799986,
104.10104255605592,
110.95250573627207]

#### Exercise 9.2.2: Plotting results

Plot your stock path using `matplotlib`

- you will certainly want to google this

**Hint** Just use `plt.plot`

nothing fancy!

1

```
# Solve me!
```

you should get the following

#### Exercise 9.2.3: Monte Carlo

In Exercise 9.2.1 you simulated a single path of a stock using GBM. Now create `paths_num = 1000`

paths and store each of these paths in another list named `paths`

**Hint** `paths`

will be a list of lists & you will require two `for`

loops. I have declared the empty list for you to fill below
and to help you on the way I have done the first loop… You will need to finish the code off by adding another loop and the formula for caluclating $s_t$.

Lets restrict this time to one year of simulation to save our CPUs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

import numpy as np
np.random.seed(42)
sigma = .6729 # annualised volatility of 67.29%
dt = 1/252 # using annualised vol so need days as frac of yr
r = 0.02 # annualised expected return
path_len = 252 # only simulating 1y this time
paths_n = 1000
paths = []
for i in range(paths_n):
s_t = 100
path = [s_t]
...
# solve me!

(The full solution takes about 3-4seconds to run on my macbook) TO verify your solution you should get

1
2
3
4
5
6

>>> paths[3][:5]
[100,
104.481881330148,
100.2784519513296,
112.02421958412168,
114.29778249864029]

#### Exercise 9.2.4: Plotting multiple lines on one plot

Plot all 1000 paths with `pandas`

using `dt`

as the index. This is a little more involved so read carefully!

When we created `paths`

we created an object like

1
2
3
4
5
6

[
[ s_00, s_01, ... , s_0n], # indexed by paths[0]
[ s_10, s_11, ... , s_1n], # indexed by paths[1]
[ ... , ... , ... , ... ],
[ s_m0, s_m1, ... , s_mn] # indexed by paths[m]
] # ^paths[m][0] ^paths[m][n]

This makes the zeroth axis the `path`

index and the first axis the time index. However, in `pandas`

**and** plotting in `matplotlib`

we generally store the time index as the zeroth index and the `path`

index as the first index.

To store in `pandas`

we need to transpose (swap round the referencing) so that in the above example we would change `paths[m][n] -> paths[n][m]`

. This is the same as transposing in Excel.

You can do this with a few different options

`paths_df = pd.DataFrame(paths).T`

`paths_df = pd.DataFrame(list(zip(*paths)))`

`paths_df = pd.DataFrame(np.array(paths).T)`

For sanity would should also probably make the index match the timesteps we are using. Lets create a proper index.

1
2
3

import pandas as pd
import numpy as np
index = pd.Index(np.arange(path_len) * dt, name='years')

Now create the `DataFrame`

with this index. Google if you get stuck.

1

paths_df = pd.DataFrame(paths, columns=index).T

Make sure that you can pass this test! If you can’t think about the order in which you created the index and then transposed the `DataFrame`

1

assert all(paths_df.index.values == index.values)

Now plot with the default `pandas`

plot function. Make sure to pass the argument `legend=None`

!

1

```
# Solve me!
```

you should get the following

### Exercise 9.2.5: Prove empirically GBM returns are Normally distributed

Prove that the 1-step stock returns are normally distributed under the GBM model by fitting a Gaussian distribution to the probability distribution function of the 1-day returns - Don’t worry it’s not as hard as it sounds :)

#### Exercise 9.2.5.1: Calculate 1-day returns using a `pandas`

object

You will need to calculate 1-day returns. You have two options for this:

- Absolute returns $\frac{s_{t} - s_{t-1}}{s_{t-1}}$
- Log returns $\ln s_{t} - \ln s_{t-1}$

Without proof 2 is equal to 1 if $\Delta t \ll T$

*e.g. if using years you want to be using a unit of $\Delta t$ that is roughly 100th of a year or smaller*

**Hint** there are four things that might help in googling: `"pandas diff"`

, `numpy natural log`

, `"pandas shift"`

, `"pandas dropna"`

also remember Google !!

1
2

rets_1d = ...
# Solve me!

You should get the following as a check

1
2
3
4
5
6

>>> rets_1d.iloc[:3, :2]
0 1
years
0.003968 0.020236 0.038088
0.007937 -0.006680 0.089136
0.011905 0.026636 0.042946

#### Exercise 9.2.5.3: Check your results

This isn’t so much of an exercise but simply a check for the previous exercise to ensure you have it correct! The maths isn’t too important. If you like, you can ignore it all and just run the code below. **You should expect differences less than 1% if you have it correct**

Without proof (see here for one) we state that the n-day returns, $R_n$ follow a Normal distribution

\[R_n \sim \mathcal{N}(\mu_n, \sigma_n)\]where

\(\mu_n = \left(r_T - \frac{1}{2}\sigma_T^2\right)n\Delta t\) \(\sigma_n = \sigma_T \sqrt{n\Delta t}\)

such that for T=252 (i.e. a year in our model) $r_T$ is the annualised rate of return (drift) and similarly $\sigma_T$ is the annualised volatility.

Lets take as a fact that we can model a probability distribution function (how likely an event) by using `scipy.stats.norm.fit`

. Then the following snippet of code will confirm your results with theory as a check

1
2
3
4
5
6
7
8

# rets_1d should be your result from previous ... you should be able to run it to get the below
import scipy.stats
mean_th = (r - .5*sigma**2)*dt
stdv_th = sigma * np.sqrt(dt)
mean_ac, stdv_ac = scipy.stats.norm.fit(rets_1d)
print(f'Difference in mean samples vs theory: {100*(mean_th - mean_ac)/mean_ac:5.2f}%')
print(f'Difference in stdv samples vs theory: {100*(stdv_th - stdv_ac)/stdv_ac:5.2f}%')

**Note** Google `f-string`

formatting!

After running the code about you should get the following the show that you have converged to the mean and standard deviation of a normal distribution

1
2

Difference in mean samples vs theory: 0.99%
Difference in stdv samples vs theory: -0.01%

#### Exercise 9.2.5.3: Plot theory vs. empirical results (reusing `matplotlib`

axes)

This exercise isn’t so much an exercise rather than an example of how to resuse a `matplotlib`

axis object to draw another line.

Compare this with the theory visually by plotting the theoretical distribution and showing its equivalence with the following code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

import scipy.stats
# you can access the numpy array beneath the pandas object by .values - then everything is numpy again
rets_1d_all_paths = rets_1d.values.ravel() # get the returns from every path for more samples (google ravel)
x = np.linspace(-.5, .5, 200) # an array of 200 numbers equally spaced between -.5 and .5
# note that you can chain "." methods in ()
# because recall that unless you have (item,) it doesn't create a tuple!
ax = (
pd.Series(rets_1d_all_paths) # convert back to pd.Series to use .hist()
.hist(bins=100, density=True, label='1-day Monte Carlo')
)
# use the same axis object that was returned from .hist()
theory = scipy.stats.norm.pdf(x, mean_th, stdv_th)
ax.plot(x, theory, alpha=0.5, label='1-day Analytic')
ax.legend()

you should get the following

### [Optional: This requires undergrad maths!] Exercise 9.2.5.4: Show this holds for 20-day returns

Show the same is true for 20day returns by plotting the relationship with 20-day returns

Note that this exercise is really aimed at structurers, derivatives traders and quants - if you haven’t studied stochastic calculus at university then just ignore this question or ask a quant if you find it interesting

1

```
# Solve me!
```

you should get the following

## Next Topic