Linear Algebra Reference

Basic Facts

The following are a some definitions and basic facts for matrices. You can treat an NxM matrix with scalars from some field \(F\) as a linear map from a vector space from \( F^M -> F^N\) . From here on out, we’ll be assuming that our field is the real numbers but we could be using rationals, complex numbers, finite fields or even types of functions.

Vector Spaces

  • we say a set of vectors span a subspace if every element in that subspace can be written as a linear combination of our vectors. We call these vectors a basis for our subspace if they are all linearly independent. That is, no vector in our set can be written as a linear combination of the other vectors.
  • for finite dimensional vector spaces, 2 vector spaces are isomorphic (i.e. we can get a bijection map from one space to another), if they have the same dimensions. That’s right, all that is really important about a vector space is the field and the dimension as long as the dimension is finite.
  • if there is a bijection from one vector space \( R^M \) to another \( R^N\), then it has an inverse which is also a bijection AND N = M.
  • if V is a finite dimensional vector space \( R^N\), then it takes N vectors to span V. So all basis sets for our vector space \( R^N \) have exactly N elements.

Linear Maps

  • a linear map is just a function from one vector space to another that preserves scalar multiplication and vector addition. Formerly we state that a linear map, T, is a mapping from \( R^M -> R^N \) such that for \( \alpha \) a real number, then \( T(\alpha x) = \alpha T(x) \) and \( T( x + y ) = T(x) + T(y) \).
  • an injection takes an independent set of vectors in \( R^M \) to one in \( R^N \). That is it maps vectors from its input 1 to 1 to the destination.
  • a surjection from \( R^M \) to \( R^N \) maps covers all of \( R^N \). Formerly we say this by stating that a linear map T is a surjection if for every element y in \( R^N \), we can find some element x in \( R^M \) such that \( y = T(x) \).
  • a bijection is a linear map that is both a surjection and an injection.

Vectors

We can treat vectors and an Nx1 matrix or a 1xM matrix as all being vectors. The dot product of two vectors is the same as \( A^T B \) = \( B^T A \) for A, B Nx1 matrices or as \( CD^T = DC^T\) for C, D 1xM matrices.

Linear Maps and Matrices

It turns out that we can treat linear maps from \( R^M -> R^N \) as identical to NxM matrices. Assuming A is an NxM matrix:

  • rank of a matrix := the number of linearly independent rows where each row is considered to be rank(A)
  • row space is the vector subspace of \(R^N\) spanned by the N rows
  • null space is the remaining space in \(R^N\) not spanned by the N row vectors
  • column space is the vector subspace formed using the columns as basis vectors. This is a subspace of \( R^N\)
  • cokernel space is the vector subspace not spanned by the column vectors. This is a subspace of \( R^N\)

Transpose

A transpose of an NxM matrix is an MxN matrix where we reflect the elements of the matrix along the diagonal.
\( (A^T)^T = A \)
\( (A^T)^{-1} = (A^{-1})^T \)
\( (AB)^T = B^T A^T \)

Invertible Matrices

The following statements are equivalent for an NxN matrix A

  1. A is invertible
  2. The rank of the matrix is N
  3. The row space of A is \( R^N \)
  4. The column space of A is \( R^N \)
  5. The determinant of A is nonzero \( \det{(A)} != 0 \)

Notes

If \( A^{-1} \) exists, it is unique
\( (cA)^{-1} = c^{-1} A^{-1} \)
\( (AB)^{-1} = B^{-1} A^{-1} \)

Orthonormal

NOTE: If the rows and columns of a square matrix are orthonormal, then then its inverse is the same as its transpose. Further its determinate is either 1 or -1. \( O(n) \) is the orthogonal group with matrices having determinant of 1 or -1. \( SO(n) \), the special orthogonal group, is the group of NxN matrices having determinant 1.

Invertible Matrices are always Square Matrices

Assume \( A_N \) is an NxN matrix, \( B_M \) is an MxM matrix and \( A_{N}x \) is an element in \( R^N \). Let \( C \) be a linear map taking an N dimensional vector space into an M dimensional one. Let \( C^{-1} \) take \( R^M -> R^N \) . Note if we have:

  • \( C C^{-1} = I_M \) and
  • \( C^{-1} C = I_N \)

then this would imply that \( M == N \) .

Calculating the Matrix Inverse

To get \( A^{-1} \) using Gaussian elimination, do the following where \( A \) is an NxN matrix, create a Nx2N \( M \) matrix such that \( M = [A I] \). now apply row operations so that the \( A \) side becomes the identity then the right hand side will be \( A^{-1} \) . E.g.

Let A be \( \begin{bmatrix}
1 & -2 & 3 \\
2 & 1 & 5 \\
3 & 1 & 1 \\
\end{bmatrix} \) then our matrix \( M \) would be \( \begin{bmatrix}
1 & -2 & 3 & 1 & 0 & 0\\
2 & 1 & 5 & 0 & 1 & 0\\
3 & 1 & 1 & 0 & 0 & 1\\
\end{bmatrix} \). We can apply row operations so that the identity appears on the left hand side giving us:

\( \begin{bmatrix}
1 & 0 & 0 & 4/33 & -5/33 & 13/33 \\
0 & 1 & 0 & -13/33 & 8/33 & -1/33 \\
0 & 0 & 1 & 1/33 & 7/33 & -5/33 \\
\end{bmatrix} \)

Singular Value Decomposition

Not all matrices have inverses but even if they don’t we can do a singular value decomposition for our real valued matrices to examine them. There is a slight generalization available for complex matrices. Assume A is an NxM matrix, then we can refactor A, called its singular value decomposition, as:

$$ A = U \Sigma V^T $$

where:

  • U is an N × N orthogonal matrix This is basically a rotation in \( R^N \) and can be thought of as a change in the coordinate system used in \( R^N\).
  • \( \Sigma \) is a diagonal NxM matrix with non-negative real numbers on the diagonal
  • V is an M × M orthogonal matrix. This is basically a rotation in \( R^M \) and can be thought of as a change in the coordinate system used in \( R^M \).

From this, we can get a pseudo inverse \(A^+ = V \Sigma^{+} U^T\) where \(\Sigma^{+}\) is another diagonal matrix using the multiplicative inverses (reciprocals) of the non-zero diagonal entries of \(\Sigma\).

Useful Tricks and Tips

Note sometimes its useful to write a NxM Matrix A all on one line with each column laid out horizontally such as

\( A = ( (element_{1,1},…element_{1,n})^T, (element_{2,1},…element_{2,n})^T,…(element_{m,1},…element_{m,n})^T )  \)

so \( A = ( A_{col(1)}, A_{col(2)},…, A_{col(m)} ) \) where \( A_{col(j)} = (element_{j,1}, …, element_{j,n})^T \)

note: A is also \( = \begin{bmatrix} A_{row(1)} \\ A_{row(2)}\\ … \\ A_{row(n)} \end{bmatrix} \)

thus
\( \begin{bmatrix}
1 & -2 & 3 \\
2 & 1 & 5 \\
\end{bmatrix} = ( (1, 2)^T, (-2, 1)^T, (3, 5)^T )
\)

so
\(
\begin{bmatrix}
1 & -2 & 3 \\
2 & 1 & 5
\end{bmatrix} * \begin{bmatrix}
4 \\
3 \\
2
\end{bmatrix} = 4 * \begin{bmatrix} 1 \\ 2 \end{bmatrix} + 3 * \begin{bmatrix} -2 \\ 1 \end{bmatrix} + 2 * \begin{bmatrix} 3 \\ 5 \end{bmatrix} = \begin{bmatrix} 4 \\ 21 \end{bmatrix}
\)

Using this nomenclature, matrix multiplication \( C = AB \) can be thought of as each element \( c_{ik} \) of the resulting matrix C to be the dot product of the ith row of A with the jth column of B. That is: $$ AB = [ A_{row(i)} ] [B_{col(k)}] = [ A_{row(i)} \cdot B_{col(k)} ] $$

So:
\( \begin{bmatrix}
1 & -2 & 3 \\
2 & 1 & 5 \\
\end{bmatrix} \begin{bmatrix} 4 \\ 3 \\ 2 \end{bmatrix} = ( (1,2)^T, (-2,1)^T, (3,5)^T ) (4,3,2)^T = 4(1,2)^T + 3(-2,1)^T + 2(3,5)^T = (4,21)^T \)

Covariance Matrix:

NOTE: variance is the square of the standard deviation and is the sum of the diffrences between a number and its mean. Informally variance measures how far a set of random #s are spread out from their mean. Covariance matrix is the generalization of this. The entries on the diagonal are the variances of the individual components while those off diagonal measure how one component varies wrt to another component. if \( \mu = E[X] \) where \( X \) is a random variable and \( E[X] \) is the expected value of \( E \) (i.e.its mean). then

\( Var(X) = E[(X – \mu)] \).

Covariance measures the difference between 2 random vars, X & Y. \( Cov(X,Y)={…E[(X(i)-\mu(X(i)))(Y(j) – \mu(Y(j)))] …} \) that is , its a matrix such that a component \( c(i,j) \) = is the expected value of the product of the \( i^{th} \) component of \( X \) differing from its mean times \( j^{th} \) component of \( Y \) differing from its mean.

https://www.math.hmc.edu/~dk/math40/math40-lect07.pdf

Misc: Other Ways of Changing Coordinate Systems

We can define new mappings \( A’ \) and \( B” \) using \( C \) and \( C^{-1} \).

\( A’ = CA_N \) transform in \( R^N -> R^M \)
\( B” = C^{-1}B_M \) transform in \( R^M -> R^N \)

Additionally, we can use the combo \( C \) and \( C^{-1} \) to define mappings from \( R^N -> R^N \) as well as \( R^M -> R^M \) .
\( A”’ = C A_N C^{-1} \) is an \( MxM \) matrix mapping \( R^M -> R^M \)
\( B”” = C^{-1} B_M C \) is an \( NxN \) matrix mapping \( R^N -> R^N \)

  • \( G_M=(C^{-1}H_{N}C) \)
  • \( G_N=(CH_{M}C^{-1}) \)