Linear Algebra for Novice Data Scientists


Introduction

This is a brief summary of linear algebra concepts needed for machine learning study. It will cost about 1 hour to go though if you had studied it before.

Basic concepts

By \(A\in\mathbb{R}^{m*n}\) we denote a matrix with m rows and n columns

By \(x\in\mathbb{R}^n\) we denote a vector with n entries

\(a_{ij}\) means the entry of A in ith row and jth column

Matrix Multiplication

For matrix \(A\in\mathbb{R}^{m*n}\) , \(B\in\mathbb{R}^{n*p}\), their product is the matrix \(C = AB \in \mathbb{R}^{m*p}\), where \(C_{ij}=\sum\limits_{k=1}^nA_{ik}B_{kj}\).

Operations and Properties

  • Identity Matrix: square matrix with ones in diagonal and zeros everywhere else. AI = A = IA
  • Transpose: a matrix results from “flipping” rows and columns. \((A^T)^T = A, (AB)^T = B^TA^T, (A+B)^T=A^T+B^T\)
  • Symmetric Matrices: A matrix is symmetric if \(A = A^T\)
  • The Trace: Sum of diagonal elements in the square matrix: \(trA=\sum\limits_{i=1}^nA_{ii}\). \(trA=trA^T, tr(A+B)=trA+trB, trAB = trBA\)
  • Norms: norm of a vector \(\Vert x\Vert\) measures the “length” of vector.

    • \({\Vert x \Vert}_2 = \sqrt{\sum\limits_{i=1}^n x_i^2}\)
    • \({\Vert x \Vert}_1 = \sum\limits_{i=1}^n |x_i|\)
    • \({\Vert x \Vert}_p = (\sum\limits_{i=1}^n |x_i|^p)^{1/p}\)

    • \({\Vert A \Vert}F = \sqrt{\sum\limits\{i=1}^m\sum\limits_{j=1}^n A_{ij}^2} = \sqrt{tr(A^TA)}\)

  • Linear Independence: A set of vectors {x1, x2, . . . xn} is said to be (linearly) independent if no vector can
    be represented as a linear combination of the remaining vectors.
  • Rank: Size of the largest subset of independent columns or rows in A. row rank is equals to column rank.
  • The Inverse: \(A^{-1}A =AA^{-1} = I\). A is invertible or non-singular if \(A^{-1}\) exists.
    • \((AB)^{-1} = B^{-1}A^{-1}\)
    • \((A^{-1})^T = (A^T)^{-1}\)
  • Orthogonal Matrices: Two vectors \(x, y \in\mathbb{R}^{n}\) are orthogonal if \(x^Ty = 0\). A matrix is orthogonal if all its columns are orthgonal to each other and normalized. In other way, \(U^TU=I=UU^T\)

The Determinant

Calculation: Sum of main diagonals minus sum of counter-diagonals

properties:

  • \(|A|=|A^T|\)
  • \(|AB|=|A||B|\)
  • For \(A\in\mathbb{R}^{n*n},|A|=0\) if and only if A is singular
  • For \(A\in\mathbb{R}^{n*n}\) and A is not singular, \(|A^{-1}=1/|A|\)

to be continued…