It’s important to know what goes on inside a machine learning algorithm. But it’s hard. There is some pretty intense math happening, much of which is linear algebra. When I took Andrew Ng’s course on machine learning, I found the hardest part was the linear algebra. I’m writing this for myself as much as you.

So here is a quick review, so next time you look under the hood of an algorithm, you’re more confident. You can view the iPython notebook (usually easier to code with) on my github.

# The Basics

**matrix** – a rectangular array of values

**vector** – one dimensional matrix

**identity matrix I** – a diagonal matrix is an n x n matrix with one’s on the diagonal from the top left to the bottom right.

i.e.

[[ 1., 0., 0.],

[ 0., 1., 0.],

[ 0., 0., 1.]]

When a matrix A is multiplied by it’s inverse A^-1, the result is the identity matrix I. Only square matrices have inverses. Example below.

Note – the inverse of a matrix is not the transpose.

__Matrices are notated m x n, or rows x columns. A 2×3 matrix has 2 rows and 3 columns.__ Read this multiple times.

You can only add matrices of the same dimensions. You can only multiply two matrices if the first is m x n, and the second is n x p. The n-dimension has to match.

Now the basics in Python

```
import numpy as np
A = np.array([[3, 2, 4]])
B = np.array([[1], [5], [5]])
print("rows by columns, or m by n")
print("A is", A.shape)
print("B is", B.shape)
print("A * B = ", np.dot(A, B)) # note -> A*B is not matrix
# multiplication in numpy!!!
```

[Output] rows by columns, or m by n A is (1, 3) B is (3, 1) A * B = [[33]]

And for using identity matrices in numpy, use the eye() function.

```
# If a matrix A is multiplied by the identity matrix, the result is A.
A = np.array([[1,2,3,4]])
print(np.dot(A, np.eye(4))) # equals A!!!
```

[[ 1. 2. 3. 4.]]

Also, calculating the inverse using inv()..or pinv()..is important. Another important function is transpose().

B = np.array([[1,2],[3,4]]) print(np.dot(B, np.linalg.inv(B))) # returns the identity matrix (approximately) print(B.transpose())

```
[[ 1.00000000e+00 0.00000000e+00]
[ 8.88178420e-16 1.00000000e+00]]
[[1 3]
[2 4]]
```

# Eigenvalues

An **eigenvalue** of a matrix A is something you can multiply some vector X by, and get the same answer you would if you multiplied A and X. In this situation, the vector X is an eigenvector. More formally –

Def: Let A be an n x n matrix. A scalar λ is called an **eigenvalue** of A if there is a nonzero vector X such that AX = λX.

Such a vector X is called an **eigenvector** of A corresponding to λ.

There is a way to compute the eigenvalues of a matrix by hand, and then a corresponding eigenvector, but it’s a bit beyond the scope of this tutorial.

# *** eigenvalues and eigenvectors *** # A = np.array([[2, -4], [-1, -1]]) x = np.array([[4], [-1]]) # a suspected eigenvector eigVal = 3 # a suspected eigenvalue print(np.dot(A, x), "\n") print(eigVal * x) # They match!

[output] [[12] [-3]] [[12] [-3]]

Now that we know matrix A has a real eigenvalue, let’s compute it with numpy!

w, v = np.linalg.eig(A) print(w) # the eigenvalues of matrix A

[ 3. -2.]

Ok, so the square matrix A has *two* eigenvalues, 3 and -2! But what about the corresponding eigenvector?

v[:, 0] # this is the normalized eigenvector corresponding to w[0], or 3. # let's unnormalize it to see if we were right. import math length = math.sqrt(x[0]**2 + x[1]**2) # the length of our original eigenvector x print(v[:, 0] * length) print("Our original eigenvector was [4, -1]")

[ 4. -1.] Our original eigenvector was [4, -1]

`Woohoo! Note – it’s important to remember that all multiples of this eigenvector will be an eigenvector of A corresponding to it’s eigenvalue (lambda).`

# Determinants

Determinants are calculated value for a given square matrix. They are used in most of linear algebra beyond matrix multiplication.

We can see where this comes from if we look at the determinant for a 2 x 2 matrix.

Imagine we have a square matrix A.

We can define it’s inverse using the formula below.

That bit in the denominator, that’s the **determinant**. If it is 0, the matrix is *singular* (no inverse!).

It has a ton of properties, for example, the determinant of a matrix equals that of it’s transpose.

They are used in calculating a matrix derivative, which is used in a ton of machine learning algorithms (i.e. normal equation in linear regression!).

# ************************ Determinants ************************ # A = np.array([[1, 2], [3, 4]]) print("det(A) = ", np.linalg.det(A))

`det(A) = -2.0`

# Singular Value Decomposition

SVD is a technique to factorize a matrix, or a way of breaking the matrix up into three matrices.

SVD is used specifically in something like Principal Component Analysis. Eigenvalues in the SVD can help you determine which features are redundant, and therefore reduce dimensionality!

It’s actually considered it’s own data mining algorithm.

It uses the formula *M = UΣV*, then uses the properties of these matrices (i.e. *U* and *V* are orthogonal, *Σ* is a diagonal matrix with non-negative entries) to furthur break them up.

Here is a bit more math-intensive example.

And of course, there’s a function in numpy 😀 .

# ******** Single Value Decomposition *********** # A = np.array([[1, 2, 3, 4, 5, 6, 7, 8], [9,10,11,12, 4,23,45, 2], [5, 3, 5, 2,56, 3, 6, 4]]) U, s, V = np.linalg.svd(A) print(U)

```
array([[-0.18149711, 0.07590154, 0.98045793],
[-0.65271926, 0.73643135, -0.17783826],
[-0.73553815, -0.6722409 , -0.08411777]])
```

I tried not to get to bogged down with the math in this tutorial, but there is a lot more to explore. There is a significant mathmatical difference between data scientists and machine learning researchers. For ML researchers, this stuff is a foundation.

While in data science it’s not as important, I personally think understanding (if possible) the algorithm you’re using is a noble goal.

Thanks! Nice tutorial. In your experience, what was the best resource (or a list of resources) for learning Linear Algebra in a way applicable to understanding machine learning algorithms?

LikeLike

Hey Alexander, thanks for the comment!

I guess that depends on your intentions. For a basic refresher – Khan Academy. (https://www.khanacademy.org/math/linear-algebra)

If you want a slightly more comprehensive understanding to make you a better data scientist, Andrew Ng’s class on Coursera has a week of linear algebra. https://www.coursera.org/learn/machine-learning/home/week/3

For a more academic understanding, so you can play around with tools like PyLearn2 or Matlab/Octave then getting a good machine learning textbook (Kevin Murphy’s is my fav) is really helpful. By doing this you’ll just absorb a lot of linear algebra. This is how I prefer to learn.

And if you really want to be a pro, take a free online course such as this (http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/). Or read a linear algebra textbook like http://math.mit.edu/~gs/linearalgebra/ .

Another option is to start with linear and logistic regression, and just really try to understand the math. This will take you through gradient descent, matrix derivatives and a lot of different operations.

Thanks for reading my article!

LikeLike

Also meant to add these. There a bit more academic but accessible enough.

http://cs229.stanford.edu/notes/cs229-notes1.pdf

http://www.cs.cmu.edu/~jingx/docs/linearalgebra.pdf

LikeLike

Please, beware that: “The inverse of a matrix will have the opposite dimensions. If a matrix A is 4 x 5, then it’s inverse A^-1 will be 5 x 4” is NOT true generally.

From Wikipedia (https://en.wikipedia.org/wiki/Invertible_matrix):

“An n-by-n square matrix A is called invertible if there exists an n-by-n square matrix B such that AB=BA=I”

“Non-square matrices (m-by-n matrices for which m ≠ n) DO NOT HAVE an inverse. ”

Regards

LikeLiked by 1 person

Thanks for catching my mistake! I’ve corrected it. I think I briefly confused transpose and inverse, even though on the next line I warned against it haha

LikeLike

[…] Python: Intro to Linear Algebra for Data Scientists It’s important to know what goes on inside a machine learning algorithm. But it’s hard. There is some pretty intense math happening, much of which is linear algebra. When I took Andrew Ng’s course on machine learning, I found the hardest part was the linear algebra. I’m writing this for myself as much as you. So here is a quick review, so next time you look under the hood of an algorithm, you’re more confident. You can view the iPython notebook (usually easier to code with) on my github. […]

LikeLike

[…] Source: http://alexhwoods.com/2015/07/11/linear-algebra-for-data-scientists/?utm_campaign=Data%2BElixir&… […]

LikeLike

[…] I wrote an article about linear algebra, with accompanying code in Python. Below is basically the same article, with […]

LikeLike

[…] I wrote an article about linear algebra, with accompanying code in Python. Below is basically the same article, with […]

LikeLike

[…] Note – If you feel you don’t have the basics to understand this article, read my intro to linear algebra article! […]

LikeLike