Linear Algebra for Novice Data Scientists
Introduction
This is a brief summary of linear algebra concepts needed for machine learning study. It will cost about 1 hour to go though if you had studied it before.
Basic concepts
By \(A\in\mathbb{R}^{m*n}\) we denote a matrix with m rows and n columns
By \(x\in\mathbb{R}^n\) we denote a vector with n entries
\(a_{ij}\) means the entry of A in ith row and jth column
Matrix Multiplication
For matrix \(A\in\mathbb{R}^{m*n}\) , \(B\in\mathbb{R}^{n*p}\), their product is the matrix \(C = AB \in \mathbb{R}^{m*p}\), where \(C_{ij}=\sum\limits_{k=1}^nA_{ik}B_{kj}\).
Operations and Properties
- Identity Matrix: square matrix with ones in diagonal and zeros everywhere else. AI = A = IA
- Transpose: a matrix results from “flipping” rows and columns. \((A^T)^T = A, (AB)^T = B^TA^T, (A+B)^T=A^T+B^T\)
- Symmetric Matrices: A matrix is symmetric if \(A = A^T\)
- The Trace: Sum of diagonal elements in the square matrix: \(trA=\sum\limits_{i=1}^nA_{ii}\). \(trA=trA^T, tr(A+B)=trA+trB, trAB = trBA\)
Norms: norm of a vector \(\Vert x\Vert\) measures the “length” of vector.
- \({\Vert x \Vert}_2 = \sqrt{\sum\limits_{i=1}^n x_i^2}\)
- \({\Vert x \Vert}_1 = \sum\limits_{i=1}^n |x_i|\)
\({\Vert x \Vert}_p = (\sum\limits_{i=1}^n |x_i|^p)^{1/p}\)
\({\Vert A \Vert}F = \sqrt{\sum\limits\{i=1}^m\sum\limits_{j=1}^n A_{ij}^2} = \sqrt{tr(A^TA)}\)
- Linear Independence: A set of vectors {x1, x2, . . . xn} is said to be (linearly) independent if no vector can
be represented as a linear combination of the remaining vectors. - Rank: Size of the largest subset of independent columns or rows in A. row rank is equals to column rank.
- The Inverse: \(A^{-1}A =AA^{-1} = I\). A is invertible or non-singular if \(A^{-1}\) exists.
- \((AB)^{-1} = B^{-1}A^{-1}\)
- \((A^{-1})^T = (A^T)^{-1}\)
- Orthogonal Matrices: Two vectors \(x, y \in\mathbb{R}^{n}\) are orthogonal if \(x^Ty = 0\). A matrix is orthogonal if all its columns are orthgonal to each other and normalized. In other way, \(U^TU=I=UU^T\)
The Determinant
Calculation: Sum of main diagonals minus sum of counter-diagonals
properties:
- \(|A|=|A^T|\)
- \(|AB|=|A||B|\)
- For \(A\in\mathbb{R}^{n*n},|A|=0\) if and only if A is singular
- For \(A\in\mathbb{R}^{n*n}\) and A is not singular, \(|A^{-1}=1/|A|\)
to be continued…