\( \def\sc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \newcommand{\dee}{\mathrm{d}} \newcommand{\Dee}{\mathrm{D}} \newcommand{\In}{\mathrm{in}} \newcommand{\Out}{\mathrm{out}} \newcommand{\pdf}{\mathrm{pdf}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\ve}[1]{\mathbf{#1}} \newcommand{\mrm}[1]{\mathrm{#1}} \newcommand{\ves}[1]{\boldsymbol{#1}} \newcommand{\etal}{{et~al.}} \newcommand{\sphere}{\mathbb{S}^2} \newcommand{\modeint}{\mathcal{M}} \newcommand{\azimint}{\mathcal{N}} \newcommand{\ra}{\rightarrow} \newcommand{\mcal}[1]{\mathcal{#1}} \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} \newcommand{\Z}{\mathcal{Z}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\z}{\mathbf{z}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\sgn}{\mathrm{sgn}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\Real}{\mathbb{R}} \newcommand{\sseq}{\subseteq} \newcommand{\ov}[1]{\overline{#1}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \)

15   2D Transformations


So far, we have learned how to write GLSL shaders, and we have introduced abstractions to make working with them easier. What we can draw so far with OpenGL are images that can be computed in the fragment shader (Chapter 13) and colored meshes (Chapter 14 and many previous chapters).

In this chapter, we introduce a new concept: transformations. A transformation is a mathematical operation that we can apply to the geometric data (points, lines, triangles, camera positions, and so on) that make up our scene so that such data change their positions and shapes. For example, given a mesh, we can apply a transformation to change its position in the scene, rotate it to face other direction, or scale it so that it becomes 2 times or 3 times bigger. We can use transformations to change how the scene is rendered; for examples, changing the zoom level or moving the final rendering around the canvas. Transformation is a fundamental concept in computer graphics, and mastering it is a requirement for a competent graphics practitioner.

Transformation, however, is quite a complicated topic in it self. In the end, we want to cover transformations on 3D objects because WebGL is all about rendering 3D scenes. Nevertheless, such 3D transformations are quite complex by themselves. Moreover, we have not really discussed how to deal with 3D data so far besides setting the $z$-coordinates to zero so as to not worry about the third dimension like in the last chapter. So, in this chapter, we shall limit ourselves to the discussions of 2D transformations. They are much easier to deal with because they are very easy to visualize with what we have learned so far, and there are fewer types of 2D transformations we need to discuss.

15.1   Definitions and examples

Mathematically, a 2D transformation is a function that consumes a 2D point and outputs a 2D point. More formally, we say that $f$ is a 2D transformation if its maps $\Real^2$ to $\Real^2$. In other words, its signature is $$f: \Real^2 \ra \Real^2.$$

15.1.1   The simplest transformations

Let us look at some examples. One of the simplest transformations is the identity transformation $\mathtt{I}$, which simply outputs whatever the input is: \begin{align*} \mathtt{I}( \ve{x} ) = \ve{x} \end{align*} for every $\ve{x} \in \Real^2$. So, \begin{align*} I((0,0)) &= (0,0), & I((1,2)) &= (1,2),& I((-5,3)) &= (-5, 3), \end{align*} and so on. (To simplify the notation, we will omit double parentheses and write $f(-5, 3)$ in place of $f((-5,3))$ from now on.)

Other transformations are more complicated to define as they involve some parameters to define their behavior. An example of transformations with parameters are the constant functions $\mathtt{C}_{\ve{a}}$ where $\ve{a} \in \Real^2$ is the parameter. The constant function always outputs $\ve{a}$ no matter what the input is: \begin{align*} \mathtt{C}_{\ve{a}}(\ve{x}) = \ve{a} \end{align*} for all $\ve{x} \in \Real^2$. So, \begin{align*} \mathtt{C}_{(2,1)}(10, 15) &= (2,1), & \mathtt{C}_{(-1,1)}(0, 0) &= (-1,1), & \mathtt{C}_{(46,49)}(39, 888) &= (46,49). \end{align*} (Again, when the parameter is a 2D point, we may drop the parantheses around it to make the notation simpler. That is, we may write $\mathtt{C}_{2,1}$ to mena $\mathtt{C}_{(2,1)}$.) Note that, while there is only one identity transformation, there are as many constants functions as there are points in $\Real^2$.

15.1.2   Translation and scaling

A type of transformation that is used extensively in computer graphics is the translation. Like a constant function, a translation $\mathtt{T}_{\ve{a}}$ is parametered by a point $\mathbf{a} \in \Real^2$. What it does is to add $\ve{a}$ to the input: \begin{align*} \mathtt{T}_{\ve{a}}(\ve{x}) = \ve{x} + \ve{a}. \end{align*} So, \begin{align*} \mathtt{T}_{1,1}(5,6) &= (6,7), & \mathtt{T}_{-2,3}(7,8) &= (5,11), & \mathtt{T}_{-10,-20}(100,200) &= (90,180). \end{align*}

Lastly, another type of transformation that is used extensively in computer graphics is scaling. It is also parameterized by a point $\ve{a} \in \Real^2$, but it is better to treat the points as an ordered pair $(a_1, a_2)$. The scaling $\mathtt{S}_{a_1,a_2}$ simply multiplies $a_1$ to the $x$ component of the input and $a_2$ to the $y$-component of the input. In other words, \begin{align*} \mathtt{S}_{a_1, a_2}\bigg( \begin{bmatrix} x \\ y \end{bmatrix} \bigg) = \begin{bmatrix} a_1 x \\ a_2 y \end{bmatrix} \end{align*} So, \begin{align*} \mathtt{S}_{2,2}(10, 9) &= (20, 18), & \mathtt{S}_{4,5}(-3, 2) &= (-12, 10), & \mathtt{S}_{20,10}(-1,-2) &= (-20, -20). \end{align*}

15.1.4   Transformations in general

In general, a transformation is a function $f: \Real^n \ra \Real^m$ that maps a point from an $n$-dimension space $\Real^n$ to an $m$-dimension space $\Real^m$. The set $\Real^n$ of the input points to $f$ is called the domain, and the set $\Real^m$ of output points of $f$ is the called the range.

When $n = m$ (in other words, the function signature is $f: \Real^n \ra \Real^n$), we call $f$ an $n$-dimensional transformation. While this chapter is exclusively about 2-dimensional (2D) transformations, we will inevitably have to deal with 3D transformations when we talk about a certain type of 2D transformations. Similarly, when we talk about 3D transformations in a later chapter, we will have to discuss 4D transformations as well.

15.2   Why transformations matter

So far, we have introduced transformation as a mathematical concept. This, however, is very abstract and quite removed from the visual world of computer graphics. However, transformations are used in graphics because they we can apply them to graphics data and see the results with our eyes. In this section, we shall see how we can apply transformations to meshes why they are useful to graphics practioners.

Meshes are made of vertices, and all vertices must have one attribute: their positions. For a 2D mesh, the position is a point in $\Real^2$. If we apply a 2D transformation to it, we get another position. In other words, when we apply a transformation to a vertex, we move it to a new position. So, if we apply the same transformation to all vertices in a mesh, we can change both the mesh's positions and shape. Let's see how the transformations we have learned so far affect 2D meshes.

For the identity transformation $\mathtt{I}$, we know that it does not change any vertex's position at all, so any mesh would remain unchanged if we apply the identity transformation to it. For the constant transformation $\mathbb{C}_{a,b}$, all vertices are moved $(a,b)$, and so the mesh is reduced to a single point. So, these two transformations are not useful on their own.

The translation $\mathtt{T}_{a,b}$ is a little more interesting. We know that it adds the vector $(a,b)$ to all vertex positions. For example, let us say we have a square mesh made with 4 vertices at $(0, 0)$, $(1,0)$, $(1,1)$, and $(0,1)$. If we apply $T_{-1,2}$ to the mesh, then the vertex position becomes $(-1,2)$, $(0, 2)$, $(0, 3)$, and $(-1, 3)$, respectively. The overall effect is presented in the figure below.

4 3 2 1 0 -1 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{T}_{-1,2}}}$ 4 3 2 1 0 -1 -2 -1 0 1 2 3
Figure 14.1 The effect of applying a translation to a square mesh.

In general, the effect of applying $\mathtt{T}_{a,b}$ to a mesh is to move the mesh to the right by $a$ units and then to upward by $b$ units. Note that $a$ and $b$ are signed. So, moving to the right by a negative number of units means moving to the left, and moving upward by a negative number of units means moving downward.

Next, let's look at scaling $\mathtt{S}_{a,b}$. When we apply it to the square mesh above, the vertices become $(0,0)$, $(a,0)$, $(a,b)$, and $(0,b)$. What the results looks like depends on the sign of $a$ and $b$. When $a$ and $b$ are both positive, it scales the mesh up $a$ times in the $x$-direction and $b$-times in the $y$-direction. Note that $a$ and $b$ can be less than $1$, which means that it can also shrinks (i.e., scales down) the mesh too.

4 3 2 1 0 -1 -1 0 1 2 3 4 ${\huge \xrightarrow{\mathtt{S}_{3,2}}}$ 4 3 2 1 0 -1 -1 0 1 2 3 4
4 3 2 1 0 -1 -1 0 1 2 3 4 ${\huge \xrightarrow{\mathtt{S}_{0.5,0.5}}}$ 4 3 2 1 0 -1 -1 0 1 2 3 4
Figure 14.2 The effects of scaling with positive scaling factors.

When $a$ is negative, the mesh "flips" horizontally. Any point to the right of the vertical line $x = 0$ is moved to the right side, and any point to the right would move to the left. The coordinate is scaled by a factor of $|a|$. Similarly, when $b$ is negative, the mesh "flips" vertically. Points above the horizontal line $y = 0$ go under, and points under the line go above.

3 2 1 0 1 -2 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{S}_{-1,-1}}}$ 3 2 1 0 1 -2 -2 -1 0 1 2 3
3 2 1 0 1 -2 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{S}_{-1,2}}}$ 3 2 1 0 1 -2 -2 -1 0 1 2 3
3 2 1 0 1 -2 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{S}_{3,-2}}}$ 3 2 1 0 1 -2 -2 -1 0 1 2 3
Figure 14.3 The effects of scaling with negative scaling factors.

One thing to nice is that, for scaling, there is a special point: the origin $(0,0)$. The origin is never changed by a scaling because multipying $0$ with any number always result in $0$. Other points are moved according to the scaling factors $a$ and $b$. It is as though "scape" itself is warped around $(0,0)$. So, we may call $(0,0)$ the center of scaling. To repeat, it is very important to remember that scaling always happen around a point. On the other hand, there is no "center of translation" because a translation move all points in the same way.

From the examples we have seen so far, we can see why transformations are useful to computer graphics. Using translation, we can move a mesh around the scene. Using scaling, we can make a mesh larger or smaller to our liking. Using these two transformations togeter allows us to resize a mesh and place it to where we like in a scene.

15.3   Transformation arithematic

Like many other mathematical objects, we can perform operations such as addition, subtraction, and multiplication on transformations to another transformation. This allows us to build more complex transformations from simpler transformations, and it makes transformation another kind of "numbers," much like vectors. In this section, we will learn about three operations on transformations that are used widely in computer graphics: composition, addition, and scalar multiplication.

15.3.1   Transformation equality

Let us talk, though, about how to decide whether two transformations are the same or not first. This will help us make more sense of the operations we are about to learn. We say that two transformations $f: \Real^n \ra \Real^m$ and $g: \Real^n \ra \Real^m$ are equal if the results of applying $f$ and $g$ to any point $\ve{x} \in \Real^m$ are the same. In mathematcal notations:

$f = g$ if $f(\ve{x}) = g(\ve{x})$ for all $\ve{x} \in \Real^n$.

In other words, $f$ and $g$ have the exact same effect on every point. We can swap $g$ in place of $f$, apply the function, and no one would notice any differences in the result. If there is a point $\ve{x}$ where $f(\ve{x}) \neq g(\ve{x})$, then $f$ and $g$ are not equal, and we would write $f \neq g$.

Let us observe equality between transformations we have seen so far. The identity transformation $\mathtt{I}$ returns the input $\ve{x}$ exactly without change. So does the translation by $(0,0)$, $\mathbb{T}_{0,0}$, and the scaling by $(1,1)$, $\mathtt{s}_{1,1}$. As a result, $$ \mathtt{I} = \mathtt{T}_{0,0} = \mathtt{S}_{1,1}.$$

The case above is the only case where a translation can be equal to a scaling. If $(a,b) \neq (0,0)$, them $\mathtt{T}_{a,b} \neq \mathtt{S}_{c,d}$ for any $(c, d) \in \Real^2$. This is because there is a point, $(0,0)$, where the effects of the two transformations differ: \begin{align*} T_{a,b}(0,0) = (a,b) \neq (0,0) = S_{c,d}(0,0). \end{align*}

15.3.2   Transformation composition

With the discussion on equality out of the way, let us talk about the first operation on transformations. Given a transformation $f: \Real^n \ra \Real^m$ and another 2D transformation $g: \Real^m \ra \Real^r$, we can create another transformation $h: \Real^n \ra \Real^r$ by just applying $f$ and then $g$. So, the deintion of $h$ is given by the equation $$h(\ve{x}) = g(f(\ve{x}))$$ for all $\ve{x} \in \Real^n$. Here, $h$ is called the composition of $f$ and $g$ and is denoted by $g \circ f$. As a result, we can write: $$(g \circ f)(\ve{x}) = g(f(\ve{x})).$$

Let us see an example. Let $f = \mathtt{T}_{1,0}$ and $g = \mathtt{S}_{2,2}$. We have that \begin{align*} (g \circ f)(x,y) &= g(f(x,y)) = \mathtt{S}_{2,2}(\mathtt{T}_{1,0}(x,y)) = \mathtt{S}_{2,2}(x+1, y) = (2x+2, 2y). \end{align*}

Notice we can only compose $f$ with $g$ only if the range of $f$ has the same number of dimensions as the domain of $g$. In this example, $f$ has signature $\Real^2 \ra \Real^2$, and $g$ has signature $\Real^2 \ra \Real^2$. We can form $g \circ f$ because the number of dimensions after the arrow in the signature of $f$ is the same as the number of dimensions before the arrow in the signature of $g$.

In this case, it also valid to form $f \circ g$, where, here, the order we apply the function is reversed. \begin{align*} (f \circ g)(x,y) &= f(g(x,y)) = \mathtt{T}_{1,0}(\mathtt{S}_{2,2}(x,y)) = \mathtt{T}_{1,0}(2x, 2y) = (2x+1, 2y). \end{align*} We see can that $f \circ g \neq g \circ f$ because their effects are different on all points.

Composition is generally not commutative. The order at which we apply transformations does matter. To begin with, let us say that $f$ has signature $\Real^n \ra \Real^m$ and $g$ has signature $\Real^m \ra \Real^r$. If $n \neq r$, then, the composition $g \circ f$ is well define, but it does not not make sense to talk about the composition $f \circ g$. This is because $g(\ve{x})$ would be a point with $r$ components, but $f$ only takes points with $n$ components. So, we can say that $g \circ f$ is not equal to $f \circ g$ because the former exists, but the latter does not. In cases where $n = r$ and $f \circ g$ is well defined, then we cannot guarantee that $g \circ f = f \circ g$ either, just like what happens in the example above.

On the other hand, if the order of transformations is fixed, it does not matter how we group the transformations in this order. More concretely, suppose we have three transformation $f$, $g$, and $h$ that we will apply to a point $\ve{x}$ in that order. The result is always $$h(g(f(\ve{x}))).$$ It does not matter if we first compose $f$ and $g$ to get $g\circ f$ and then compose it with $h$ before applying the result. In other words, \begin{align*} (h \circ (g \circ f))(\ve{x}) = h((g\circ f)\ve{x}) = h(g(f(\ve{x}))). \end{align*} It also does not matter if we compose $g$ and $h$ to get $h\circ g$ and then compose it with $f$ later: \begin{align*} ((h \circ g) \circ f)(\ve{x}) = (h \circ g)(f(\ve{x})) = h(g(f(\ve{x}))). \end{align*} This is because, in the end, all the transformations are applied in the same order. Mathematically, this fact is captured by the associative property: \begin{align*} (h \circ g) \circ f = h \circ (g \circ f) \end{align*} for any transformations $f$, $g$, and $h$. As a result, when we a long chain of transformations such as $f_5 \circ f_4 \circ f_3 \circ f_2 \circ f_1$, there is no need to write any parentheses between them because any parenthesization would yield the same transformation nonetheless. To repeat, composition is associative. As long as the order does not change, different groupings of transformations within the order do not produce different results.

While composition is in general not commutative, there are special cases where they are. The first special case is that the identity transformation commutes with any 2D transformation: for any $f \in \Real^2 \ra \Real^2$, we have that \begin{align*} \mathtt{I} \circ f = f \circ \mathtt{I}. \end{align*} The second special case is that two translations commute with each other: \begin{align*} \mathtt{T}_{a,b} \circ \mathtt{T}_{c,d} = \mathtt{T}_{a+c, b+d} = \mathtt{T}_{c,d} \circ \mathtt{T}_{a,b}. \end{align*} So do two scalings: \begin{align*} \mathtt{S}_{a,b} \circ \mathtt{S}_{c,d} = \mathtt{S}_{ac, bd} = \mathtt{S}_{c,d} \circ \mathtt{S}_{a,b}. \end{align*} Commutativity generally breaks when you start mixing different types of transformations together.

15.3.3   Transformation addition

A transformation produces a vector, which can be added. We can take this operation on vectors and then "lift" it to become an operation on transforamtions. More precisely, let $f$ and $g$ be two transformations with the same signature (say $\Real^n \ra \Real^m$), the sum of $f$ and $g$ is the transformation $h$ such that \begin{align*} h(\ve{x}) = f(\ve{x}) + g(\ve{x}) \end{align*} for all $x \in \Real^n$. For convenience, we shall write the sum of $f$ and $g$ as $f + g$. This gives: \begin{align*} (f+g)(\ve{x}) = f(\ve{x}) + g(\ve{x}). \end{align*} Transformation addition inherits many properties from vector addition (and addition in general). It is commutative and associative.

For the composition operator, the identity transformation $\mathtt{I}$ is the magic transformation that, when composed with another 2D ransformation, would leave the other transformation unchanged. For addition, this is no longer the case. This is because \begin{align*} (\mathtt{I} + f)(\ve{x}) = \mathtt{I}(\ve{x}) + f(\ve{x}) = \ve{x} + f(\ve{x}). \end{align*} We have that, the RHS would not be equal to the LHS unless $\ve{x} = (0,0)$, and so $I+f \neq f$. The magic transformatioin in this case is the constant transformation $\mathtt{C}_{0,0}$: \begin{align*} (\mathtt{C}_{0,0} + f)(\ve{x}) = \mathtt{C}_{0,0}(\ve{x}) + f(\ve{x}) = (0,0) + f(\ve{x}) = f(\ve{x}). \end{align*}

Adding two scalings together results in a scaling where the parameters are added. \begin{align*} (\mathtt{S}_{a,b} + \mathtt{S}_{c,d})(x,y) &= \mathtt{S}_{a,b}(x,y) + \mathtt{S}_{c,d}(x,y) \\ &= (ax, by) + (cx, dy) \\ &= \big( (a+c)x, (b+d) y \big) \\ &= \mathtt{S}_{a+c,b+d}(x,y). \end{align*} So, \begin{align*} \mathtt{S}_{a,b} + \mathtt{S}_{c,d} = \mathtt{S}_{a+c,b+d} \end{align*}

However, adding two translations together do not produce the same result as above. \begin{align*} (\mathtt{T}_{a,b} + \mathtt{T}_{c,d})(x,y) &= \mathtt{T}_{a,b}(\ve{x}) + \mathtt{T}_{c,d}(x,y) \\ &= (x+a, y+b) + (x+c,y+d) \\ &= (2x+a+b, 2y+b+d) \\ &= (x,y) + \big((x,y) +(a+c,b+d)) \\ &= \mathtt{I}(x,y) + \mathtt{T}_{a+c,b+d}(x,y) \end{align*} So, \begin{align*} \mathtt{T}_{a,b} + \mathtt{T}_{c,d} = \mathtt{I} + \mathtt{T}_{a+c,b+d} \end{align*}

In fact, we can say that a translation is the identity transformation added to a constant transformation:

\begin{align*} \mathtt{T}_{a,b}(x,y) &= (x+a,y+b) \\ &= (x,y) + (a,b) \\ &= \mathtt{I}(x,y) + \mathtt{C}_{a,b}(x,y) \\ &= (\mathtt{I} + \mathtt{C}_{a,b})(x,y), \end{align*} which gives us \begin{align*} \mathtt{T}_{a,b} = \mathtt{I} + \mathtt{C}_{a,b}. \end{align*}

15.3.4   Transformation scalar multiplication

Similar to the previous section, we can take another vector operation, multiplication by a scalar, and define an operation on transformation with it. Let $f: \Real^n \ra \Real^m$ be a transformation and $c \in \Real$ be a scalar. The product of $f$ and $c$ is a transformation $g$ such that \begin{align*} g(\ve{x}) = c f(\ve{x}) \end{align*} for all $\ve{x} \in \Real^n$. We write the product as $cf$. Note that the scalar multiplication is conducted to after we apply the transformation. The order is important here.

If $f: \Real^2 \ra \Real^2$ is a 2D transformation, then muliplication by scalar is actually the same as composition with a scaling: \begin{align*} (c f)(\ve{x}) = cf(\ve{x}) = \mathtt{S}_{c,c}(f(\ve{x})) = (\mathtt{S}_{c,c} \circ f)(\ve{x}), \end{align*} and so \begin{align*} cf = \mathtt{S}_{c,c} \circ f. \end{align*}

The reader might think it is pointless to introduce a new operation that can be defined in terms of previous ones. However, scalar multiplication gives us a a way to write expressions involving transformation more succinctly. A scaling $\mathtt{S}_{c,c}$ where the two scaling parameters are equal is called a uniform scaling. We can replace a composition with such a scaling (which we would write $\mathtt{S}_{c,c} \circ$) with just $c$.

Another convenience afforded by scalar multiplication is the definition of transformation subtraction, which we can defined as: \begin{align*} f - g = f + (-1)g. \end{align*}

Scalar multiplication inherits properties from the multiplication operators for real numbers. For one, the order we multiply the scalars do not matter: \begin{align*} c(df) = (cd)f = (dc)f = d(cf). \end{align*} When paired with transformation addition, we also have the distributive rules: \begin{align*} c(f + g) &= cf + cg \\ (c + d)f &= cf + df \end{align*} In fact, one may show that the distributive rules also apply to scaling transformations. \begin{align*} \mathtt{S}_{a,b} \circ (f + g) &= \mathtt{S}_{a,b} \circ f + \mathtt{S}_{a,b} \circ g \\ (\mathtt{S}_{a,b} + \mathtt{S}_{c,d}) \circ f &= \mathtt{S}_{a,b} \circ f + \mathtt{S}_{c,d} \circ f \end{align*}

15.4   Linear transformations

We have studied some simple types of transformations. It is time to study another type of transformations that is more complicated than the one we have encountered before: linear transformation. Linear transformations are widely used in computer graphics because they encompass useful transformations and also because they can easily represented by data, not programs.

We say that a transformation (or a function) $f: \Real^n \ra \Real^m$ is linear if the following conditions are true:

  1. $f(\ve{x} + \ve{y}) = f(\ve{x}) + f(\ve{y})$ for any $\ve{x}, \ve{y} \in \Real^n$, and
  2. $f(c\ve{x}) = cf(\ve{x})$ for all $\ve{x} \in \Real^n$ and $c \in \Real$.
Condition (1) says that the transformation $f$ "distributes" like how a multiplication with scalar interacts with addition. Condition (2) says that the function "commutes" with scalar multiplcation.

Let's classify the 2D transformations we have seen so far whether they are linear or not. The constant function is generally not linear. If $(a,b) \neq (0,0)$ we have that \begin{align*} \mathtt{C}_{a,b}\big((x,y) + (0,0)\big) &= \mathtt{C}_{a,b}(x,y) \\ &= (a,b) \\ &\neq (a,b) + (a,b) \\ &= \mathtt{C}_{a,b}(x,y) + \mathtt{C}_{a,b}(0,0). \end{align*} The reader should check that $\mathtt{C}_{0,0}$ is linear.

Translation is also not linear if $(a,b) \neq (0,0)$. This is because \begin{align*} \mathtt{T}_{a,b}((x,y) + (0,0)) &= \mathtt{T}_{a,b}((x,y) + (0,0)) \\ &= (x,y) + (a,b) \\ &\neq (x,y) + (a,b) + (a,b) \\ &= \mathtt{T}_{a,b}(x,y) + \mathtt{T}_{a,b}(0,0) \end{align*} The only translation that is linear is $\mathtt{T}_{0,0} = \mathtt{I}$.

On the other hand, scaling is linear. This is because \begin{align*} \mathtt{S}_{a,b}((x_0, y_0) + (x_1, y_1)) &= \mathtt{S}_{a,b}(x_0 + x_1, y_0 + y_1) \\ &= (a(x_0 + x_1), b(y_0 + y_1)) \\ &= (ax_0 + ax_1, by_0 + by_1) \\ &= (ax_0, by_0) + (ax_1, by_1) \\ &= \mathtt{S}_{a,b}(x_0, y_0) + \mathtt{S}_{a,b}(x_1, y_1), \end{align*} and \begin{align*} \mathtt{S}_{a,b}(c(x, y)) &= \mathtt{S}_{a,b}(c(x, y)) \\ &= \mathtt{S}_{a,b}(cx, cy) \\ &= (acx, bcy) \\ &= c(ax,by) \\ &= c\,\mathtt{S}_{a,b}(x, y). \end{align*} Because $\mathtt{I} = \mathtt{S}_{1,1}$, we have that the identity transformation is linear as well.

One characteristic of a linear transformation is that it must preserve the point $(0,0)$. Let us say that $f$ is linear, then \begin{align*} f(0,0) = f((0,0) + (0,0)) = f(0,0) + f(0,0) = 2f(0,0), \end{align*} which meanes that $f(0,0) = (0,0)$. This is one of the reason why, if $(a,b) \neq (0,0)$, then $\mathtt{C}_{a,b}$ and $\mathtt{T}_{a,b}$ are not linear.

15.5   2D rotation

In Section 15.1, we learned about two types of transformations that are important to computer graphics: translation and scaling. There is a third type: rotation. In this chapter, we shall learn about rotations in 2D, and we will study rotations in 3D later in Chapter XXX.

In 2D, a rotation has a parameter, the angle $\theta$, and is denoted by $\mathtt{R}_\theta$. Given a point $\mathbf{x}$, the action of $\mathtt{R}_\theta$ on $\ve{x}$ is as follows.

  1. Interpret $\ve{x}$ as a vector from point $(0,0)$ to $\ve{x}$.
  2. Rotate the vector by angle $\theta$ in the counterclockwise direction.

Let us take the above description step by step. Let us consider a rotation by $\theta = 90^\circ$, and let us apply it to a point, say $(2,2)$. The first thing to do is to visualize $(2,1)$ on the 2D plane.

3 2 1 0 1 -2 -2 -1 0 1 2 3
Figure 14.4 The point $(2,1)$ visualized on the 2D plane.

Then, we should interpret the point as a vector from $(0,0)$, the origin. This means we draw an arrow from $(0,0)$ to $(1,2)$ and forget, for the moment, that the point ever exists.

3 2 1 0 1 -2 -2 -1 0 1 2 3
Figure 14.5 The point $(2,1)$ interpreted as a vector.

Next, we rotate the vector by $90^\circ$ in the counterclockwise rotation, using the point $(0,0)$ as the center of rotation.

3 2 1 0 1 -2 -2 -1 0 1 2 3 90°
Figure 14.6 The vector $(2,1)$ being rotated by $90^\circ$.

Lastly, we interpret the rotated vector as a point. The result is $(-1,2)$. In other words, $\mathtt{R}_{90^\circ}(2,1) = (-1,2).$

3 2 1 0 1 -2 -2 -1 0 1 2 3 90°
Figure 14.7 The rotated vector interpretted as a point.

Let us try to derive a formula for rotation. Say, we are given a point $(x,y)$, which we would like to rotate by the angle $\theta$. We know that we can always find a real number $r \geq 0$ and $\phi$ such that \begin{align*} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} r \cos \phi \\ r \sin \phi \end{bmatrix}. \end{align*} This is basically writing $(x,y)$ in terms of polar coordinates (Chapter 4.5.1). In other words, we interpet the point as a vector of length $r$ that makes an anle $\phi$ with the positive $x$-axis. After we rotate this vector by the angle $\theta$, the vector length remains the same. (In other words, $r$ does not change.) However, the angle that the vector makes with the positive $x$-axis changes from $\phi$ to $\phi + \theta$. Symbolically, this means that \begin{align*} \mathtt{R}_{\theta} \left( \begin{bmatrix} r \cos \phi \\ r \sin \phi \end{bmatrix} \right) = \begin{bmatrix} r \cos (\phi + \theta) \\ r \sin (\phi + \theta) \end{bmatrix} \end{align*} Let us now recall the rules for sine and cosine addition from high school. \begin{align*} \cos(\phi + \theta) &= \cos \phi \cos\theta - \sin\phi \sin\theta \\ \sin(\phi + \theta) &= \sin \phi \cos\theta + \cos\phi \sin\theta. \end{align*} As a result, \begin{align*} \begin{bmatrix} r \cos (\phi + \theta) \\ r \sin (\phi + \theta) \end{bmatrix} &= \begin{bmatrix} r \cos \phi \cos\theta - r \sin\phi \sin\theta \\ r \sin \phi \cos\theta + r \cos\phi \sin\theta \end{bmatrix} \\ &= \begin{bmatrix} (r \cos \phi) \cos\theta - (r \sin\phi) \sin\theta \\ (r \sin \phi) \cos\theta + (r \cos\phi) \sin\theta \end{bmatrix} \\ &= \begin{bmatrix} x \cos\theta - y \sin\theta \\ y \cos\theta + x \sin\theta \end{bmatrix} \\ &= \begin{bmatrix} x \cos\theta - y \sin\theta \\ x \sin\theta + y \cos\theta \end{bmatrix}. \end{align*} In other words, \begin{align*} \mathtt{R}_\theta(x,y) = \begin{bmatrix} x \cos\theta - y \sin\theta \\ x \sin\theta + y \cos\theta \end{bmatrix}. \end{align*}

From the formula, we can deduce that a rotation is a linear transformation. This is because \begin{align*} \mathtt{R}_{\theta}((a,b) + (c,d)) &= \mathtt{R}_{\theta}(a+c, b+d) \\ &= \begin{bmatrix} (a+c) \cos\theta - (b+d) \sin\theta \\ (a+c) \sin\theta + (b+d) \cos\theta \end{bmatrix} \\ &= \begin{bmatrix} a \cos\theta - b \sin\theta \\ a \sin\theta + b \cos\theta \end{bmatrix} + \begin{bmatrix} c \cos\theta - d \sin\theta \\ c \sin\theta + d \cos\theta \end{bmatrix} \\ &= \mathtt{R}_{\theta}(a, b) + \mathtt{R}_{\theta}(c, d), \end{align*} and \begin{align*} \mathtt{R}_{\theta}\big(c\,(a,b)\big) &= \mathtt{R}_{\theta}(ca,cb) \\ &= \begin{bmatrix} ca \cos\theta - cb \sin\theta \\ ca \sin\theta + cb \cos\theta \end{bmatrix} \\ &= c \begin{bmatrix} a \cos\theta - b \sin\theta \\ a \sin\theta + b \cos\theta \end{bmatrix} \\ &= c\, \mathtt{R}_{\theta}(a,b). \end{align*}

Because a rotation, it is a linear transformation, we have that $\mathtt{R}_{\theta}(0,0)$. This makes sense because, after all, the point $(0,0)$ is the center of rotation, which will not due to the rotation. The rotation by $0^\circ$ leaves all points unchanged because there is no rotation, and so \begin{align*} \mathtt{R}_{0^\circ} = \mathtt{I} = \mathtt{T}_{0,0}. \end{align*}

15.6   Matrices

We said earlier in Section 15.4, we said that linear transformations are widely used in computer graphics because they can be represented by data. In this section, we shall study matrices (singular: matrix), the mathematical objects and the data type that can be used to repreesent linear transformations.

15.6.1   Definitions and notations

Formally, a matrix is a two-dimensional table of numbers. We write down the table and surround them by square brackets. For example,

\begin{align*} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix}. \end{align*}

Each number in the matrix is called an element, so the matrices above has 12 elements. A matrix is divided vertically into rows

1 2 3 4 4 6 7 8 9 10 11 12

and horizontally into columns.

1 2 3 4 4 6 7 8 9 10 11 12

A matrix with $m$ rows and $n$ columns are said to be of size $m \times n$, so the matrix above is of size $3 \times $. We index the rows with natural numbers: the top row is Row 1, the second row from the top is Row 2, and so on. We index the columns similarly from left to right: the leftmost column is Column 1, the second leftmost column is Column 2, and so on.

1 2 3 4 4 6 7 8 9 10 11 12 1 2 3 1 2 3 4

We refer to an element in the matrix with the indices of its row and column, in that order. As a result, the top-left entry of the matrix above is Element $(1,1)$, and the bottom-right element is Element $(3,4)$. The Element $(2,3)$ is the element on Row 2 and Column 3, which is $7$. On the other hand, the element $(3,2)$ is the Element on Row 3 and Column 2, which is $10$.

We often use a upper-case letter such as $A$, $B$, and $C$ to represent a matrix, and we use the corresponding lower-case letter to represents its element.The symbol for the row element is subscripted with the indices of the row and the column, in that order. So, Element $(2,3)$ of matrix $A$ is denoted by $a_{23}$, and Element $(3,2)$ is denoted by $a_{32}$. So, if $A$ is a $3 \times 4$ matrix, then it can be written as follows. \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{33} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \end{bmatrix}. \end{align*} In case $A$ is of size $m \times n$, then \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{33} & \cdots & a_{2n} \\ a_{31} & a_{32} & a_{33} & \cdots & a_{3n} \\ \vdots & \vdots &\vdots &\ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix}. \end{align*}

We often write 2D and 3D vectors using the "columm vector notation". \begin{align*} \ve{a} &= \begin{bmatrix} a_x \\ a_y \end{bmatrix}, & \ve{b} &= \begin{bmatrix} b_x \\ b_y \\ b_z \end{bmatrix}. \end{align*} Notice that a 2D vector looks just like a $2 \times 1$ matrix, and a 3D vector a $3 \times 1$ matrix. In fact, while they are not exactly the same thing, we can always interpret an $n$-dimensional vector as a $n \times 1$ matrix, and we shall do so when it is convenient to do so from now on.

All of the matrices we will deal with in this book, the entries are real numbers. The set of $m \times n$ matrices whose entries are real numbers are denoted by the symbol $\mathbb{R}^{m \times n}$.

15.6.2   Matrix representation of 2D transformations

Earlier in Section 4.1.1, we learned that all 2D vectors can be written in terms of two special vectors $\hat{\ve{x}} = (1,0)$ and $\hat{\ve{y}} = (0,1)$. More, specifically, we have that \begin{align*} (x,y) = (x, 0) + (0,y) = x\,(1,0) + y\,(0,1) = x\hat(\ve{x}) + y \hat(\ve{y}). \end{align*}

Now, let $f$ be a linear transformation, we have that \begin{align*} f(x,y) = f(x\hat{\ve{x}} + y \hat{\ve{y}}) = x\,f(\hat{\ve{x}}) + y\,f(\hat{\ve{y}}) = x\,f(1,0) + y\,f(0,1). \end{align*} The above equation is kind of remarkable. It says that, if we know the effect of a linear transformations $f$ on just two vectors—$\hat{\ve{x}} = (1,0)$ and $\hat{\ve{y}} = (0,1)$—then we know what it does to all the other vectors. If we are given $(x,y)$, we just multiple $x$ to $f(1,0)$ and $y$ to $f(0,1)$ and then add the results together.

This means that we can represent any linear transformations $f$ by $4$ numbers. This is because $f(1,0)$ is a 2D vector, which can be represented by 2 numbers, and so is $f(0,1)$. Let us give names to the components of these vectors. \begin{align*} f(1,0) &= \begin{bmatrix} a \\ b \end{bmatrix}, & f(0,1) &= \begin{bmatrix} c \\ d \end{bmatrix}. \end{align*} The conventional way to write down a representation of $f$ is to take the vectors $f(1,0)$ and $f(0,1)$ side by side \begin{align*} \begin{bmatrix} a \\ b \end{bmatrix} \begin{bmatrix} c \\ d \end{bmatrix} \end{align*} and then remove the borders between them to get a matrix \begin{align*} \begin{bmatrix} a & c \\ b & d \end{bmatrix}. \end{align*}

To recap, the matrix representation of a 2D linear transformation $f$ is obtained by the following process.

  1. Compute $f(1,0)$ and $f(0,1)$.
  2. Form a $2 \times 2$ matrix whose first column is the column vector $f(1,0)$ and whose second column is the column vector $f(0,1)$.

Let's see what the matrix representations of the 2D linear transformations we have learn so far look like. The simplest of these transformations is the identity transformation $\mathtt{I}$. Because \begin{align*} \mathtt{I}(1,0) &= \begin{bmatrix} 1 \\ 0 \end{bmatrix}, & \mathtt{I}(0,1) &= \begin{bmatrix} 0 \\ 1 \end{bmatrix},& \end{align*} the matrix representation of $\mathtt{I}$ is the matrix \begin{align*} I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \end{align*} The matrix $I$ has a special name: it is called the identity matrix after the transformation that it represents.

As we will see later, a matrix that represents a linear transformation in an $n$-dimension space will be of size $n \times n$. That is, there are as many rows as there are columns. Such a matrix is called a square matrix.

Next, let us look at the scaling transformation $\mathtt{S}_{a,b}$. Because \begin{align*} \mathtt{S}_{a,b}(1,0) &= \begin{bmatrix} a \\ 0 \end{bmatrix}, & \mathtt{S}_{a,b}(0,1) &= \begin{bmatrix} 0 \\ b \end{bmatrix},& \end{align*} the matrix representation of $\mathtt{S}_{a,b}$ is the matrix \begin{align*} S_{a,b} = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}. \end{align*}

The matrix representation for the rotation $\mathtt{R}_{\theta}$ is more complicated. Note that \begin{align*} \mathtt{R}_{\theta}(1,0) &= \begin{bmatrix} \cos \theta \\ \sin\theta \end{bmatrix}, & \mathtt{R}_{\theta}(0,1) &= \begin{bmatrix} -\sin\theta \\ \cos\theta \end{bmatrix}.& \end{align*} As a result, the matrix representation of $\mathtt{R}_\theta$ is \begin{align*} R_{\theta} = \begin{bmatrix} \cos \theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}. \end{align*} We can substitute angles whose sine and cosine values are well-known to see what the matrices of the rotations by these angles are. For examples, \begin{align*} R_{90^\circ} &= \begin{bmatrix} \cos 90^\circ & -\sin 90^\circ \\ \sin 90^\circ & \cos 90^\circ \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}, \\ R_{180^\circ} &= \begin{bmatrix} \cos 180^\circ & -\sin 180^\circ \\ \sin 180^\circ & \cos 180^\circ \end{bmatrix} = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}, \\ R_{270^\circ} &= \begin{bmatrix} \cos 270^\circ & -\sin 270^\circ \\ \sin 270^\circ & \cos 270^\circ \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \\ R_{45^\circ} &= \begin{bmatrix} \cos 45^\circ & -\sin 45^\circ \\ \sin 45^\circ & \cos 45^\circ \end{bmatrix} = \begin{bmatrix} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \end{bmatrix}, \\ R_{30^\circ} &= \begin{bmatrix} \cos 30^\circ & -\sin 30^\circ \\ \sin 30^\circ & \cos 30^\circ \end{bmatrix} = \begin{bmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix}, \\ R_{60^\circ} &= \begin{bmatrix} \cos 60^\circ & -\sin 60^\circ \\ \sin 60^\circ & \cos 60^\circ \end{bmatrix} = \begin{bmatrix} \frac{1}{2} & -\frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & \frac{1}{2} \end{bmatrix}. \end{align*}

We have seen that, given a linear transformation, we can write down the $2 \times 2$ matrix that represents it. On the other hand, if we are given a $2 \times 2$ matrix \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \end{align*} then we can ask what is the linear transformation that this matrix represent. Let us call this transformation $f$. By reverse engineering the process of writing down the matrix representation of $f$, we know that the first column of $A$ gives us the value of $f(1,0)$. So, \begin{align*} f(1,0) = \begin{bmatrix} a_{11} \\ a_{21} \end{bmatrix}. \end{align*} Moreover, the second column gives us the value of $f(0,1):$ \begin{align*} f(0,1) = \begin{bmatrix} a_{12} \\ a_{22} \end{bmatrix}. \end{align*} Because $f$ is a linear transformation, we have that \begin{align*} f(x,y) = x\, f(1,0) + y\, f(0,1) = x\begin{bmatrix} a_{11} \\ a_{21} \end{bmatrix} + y \begin{bmatrix} a_{12} \\ a_{22} \end{bmatrix} = \begin{bmatrix} a_{11} x + a_{12}y \\ a_{21} x + a_{22}y \end{bmatrix} \end{align*} For example, consider the matrix \begin{align*} B = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}. \end{align*} The linear transformation it represents is \begin{align*} g(x,y) = \begin{bmatrix} x + 2y \\ 3x + 4y \end{bmatrix}. \end{align*} This transformation is neither a scaling nor a rotation though. It is just a random linear transformation.

15.6.3   Matrix representation of linear tranformation in general

In general, an $m \times n$ matrix represents a linear tranformation from $\Real^n$ to $\Real^m$. (Notice that the orders $n$ and $m$ swap here.) In other words, any $f:\Real^n \times m$ can be represented by an $m \times n$ table of numbers. To see this, observe that a point $\ve{x}$ \in $\Real^n$ is a vector \begin{align*} \ve{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n \end{bmatrix} \end{align*} where $x_1, x_2, x_3, \dotsc, x_n \in \Real$. Let $\hat{\ve{e}}_i$ denote the $n$-dimensional vector such that its $i$th component is $1$ and all the other components are $0$. This definition gives us the following $n$ vectors: \begin{align*} \hat{\ve{e}}_1 &= \begin{bmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, & \hat{\ve{e}}_2 &= \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, & \hat{\ve{e}}_3 &= \begin{bmatrix} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix}, & & \dotsb & \hat{\ve{e}}_n &= \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix}. \end{align*} We can use them to rewrite $\ve{x}$ as follows. \begin{align*} \ve{x} &= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n \end{bmatrix} \\ &= \begin{bmatrix} x_1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ x_2 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ x_3 \\ \vdots \\ 0 \end{bmatrix} + \dotsb + \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ x_n \end{bmatrix} \\ &= x_1 \begin{bmatrix} 1\\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + x_2 \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + x_3 \begin{bmatrix} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix} + \dotsb, + x_n \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} \\ &= x_1\,\hat{\ve{e}}_1 + x_2\,\hat{\ve{e}}_2 + x_3\,\hat{\ve{e}}_3 + \dotsb + x_n\,\hat{\ve{e}}_n. \end{align*} Given that $f$ is a linear transformation, we have that \begin{align*} f(\ve{x}) &= f(x_1\,\hat{\ve{e}}_1 + x_2\,\hat{\ve{e}}_2 + x_3\,\hat{\ve{e}}_3 + \dotsb + x_n\,\hat{\ve{e}}_n) \\ &= f(x_1\,\hat{\ve{e}}_1) + f(x_2\,\hat{\ve{e}}_2) + f(x_3\,\hat{\ve{e}}_3) + \dotsb + f(x_n\,\hat{\ve{e}}_n) \\ &= x_1\,f(\hat{\ve{e}}_1) + x_2\,f(\hat{\ve{e}}_2) + x_3\,f(\hat{\ve{e}}_3) + \dotsb + x_n\,f(\hat{\ve{e}}_n). \end{align*} As a result, the behavior of $f$ is determined completely by the values $f(\hat{\ve{e}}_1)$, $f(\hat{\ve{e}}_2)$, $\dotsc$, and $f(\hat{\ve{e}}_n)$. Because each of $f(\hat{\ve{e}}_i)$ is an $m$-dimensional vector, we can represent $f$ by $n \times m$ numbers. To write down the representation, let us first fix the symbols for the entries of the vectors. \begin{align*} f(\hat{\ve{e}}_1) &= \begin{bmatrix} a_{11} \\ a_{21} \\ a_{31} \\ \vdots \\ a_{m1} \end{bmatrix}, & f(\hat{\ve{e}}_2) &= \begin{bmatrix} a_{12} \\ a_{22} \\ a_{32} \\ \vdots \\ a_{m2} \end{bmatrix}, & & \dotsc, & f(\hat{\ve{e}}_n) &= \begin{bmatrix} a_{1n} \\ a_{2n} \\ a_{3n} \\ \vdots \\ a_{mn} \end{bmatrix}. \end{align*} We then do the same thing we did in the case of 2D transformations: we stick the vectors together horizontally and remove the brackets between them. What we get in the end is an $m \times n$ matrix. \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \cdots & a_{2n} \\ a_{31} & a_{32} & a_{33} & \cdots & a_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix}. \end{align*}

Conversely, if we start with an $m \times n$ matrix $A$ above, then the matrix gives rise to a linear transformation $f$ where \begin{align*} f(\ve{x}) &= f\left(\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n \end{bmatrix}\right) = x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ a_{31} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ a_{32} \\ \vdots \\ a_{m2} \end{bmatrix} + x_3 \begin{bmatrix} a_{13} \\ a_{23} \\ a_{33} \\ \vdots \\ a_{m3} \end{bmatrix} + \dotsb + x_n \begin{bmatrix} a_{1n} \\ a_{2n} \\ a_{3n} \\ \vdots \\ a_{mn} \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 + \dotsb + a_{1n}x_n\\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 + \dotsb + a_{2n}x_n\\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 + \dotsb + a_{3n}x_n\\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 + \dotsb + a_{mn}x_n \end{bmatrix}. \end{align*} The above formula might seem complicated. A way to remember it is as follows: we multiply the $n$ numbers in the vector $\ve{x}$ with the $n$ columns of $A$, and then add the products together.

15.7   Matrix Operations

A matrix is a mathematical object which naturally has a number of operations like addition, subtraction, multiplication, and so on that we can perform on it. In this section, we will discuss these operations in details.

15.7.1   Transposition

The first operation we shall study is probably the easiest one because it does not involved any calculation. Given an $m \times n$ matrix $A$, the transpose of $A$, denoted by $A^T$, is a matrix whose $(i,j)$-entry is $a_{ji}$. More concretely, if \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{23} & a_{24} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & a_{m4} & \cdots & a_{mn} \end{bmatrix}, \end{align*} then \begin{align*} A^T = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ a_{13} & a_{23} & \cdots & a_{m3} \\ a_{14} & a_{24} & \cdots & a_{m4} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{bmatrix}. \end{align*}

The symbols in the above two equations might be confusing. Let us see a concrete example. Let \begin{align*} A = \begin{bmatrix} 1 & 11 & 111 & 1111 & 11111 \\ 2 & 22 & 222 & 2222 & 22222 \\ 3 & 33 & 333 & 3333 & 33333 \end{bmatrix}. \end{align*} Then, we have that \begin{align*} A^T = \begin{bmatrix} 1 & 11 & 111 & 1111 & 11111 \\ 2 & 22 & 222 & 2222 & 22222 \\ 3 & 33 & 333 & 3333 & 33333 \end{bmatrix}^T = \begin{bmatrix} 1 & 2 & 3 \\ 11 & 22 & 33 \\ 111 & 222 & 333 \\ 1111 & 2222 & 3333 \\ 11111 & 22222 & 33333 \end{bmatrix}. \end{align*} One can see that, when we transpose a matrix, we exchange the rows with the columns in the sense that the $i$th row of $A^T$ is the $i$th column of $A$, and the $i$th column of $A^T$ is the $i$th row of $A$.

If $A$ is a square matrix, the effect of transpoation is that we "flip" the matrix along its top-left-to-bottom-right diagonal (also known as the main diagonal): \begin{align*} \begin{bmatrix} 0 & 1 & 2 & 3 \\ -1 & 0 & 1 & 2 \\ -2 & -1 & 0 & 1 \\ -3 & -2 & -1 & 0 \end{bmatrix}^T &= \begin{bmatrix} 0 & -1 & -2 & -3 \\ 1 & 0 & -1 & -2 \\ 2 & 1 & 0 & -1 \\ 3 & 2 & 1 & 0 \end{bmatrix}. \end{align*}

We noted earlier that an $n$-dimensional vector can be thought of as a $n \times 1$ matrix. Let $\ve{a}$ be such a vector. \begin{align*} \ve{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix} \end{align*}. We can transpose a vector like we transpose a matrix. The result is an $1 \times n$ matrix: \begin{align*} \ve{a}^T &= \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix}^T = \begin{bmatrix} a_1 & a_2 & a_3 & \cdots & a_n \end{bmatrix}. \end{align*} Despite the fact that $\ve{a}$ and $\ve{a}^T$ have the exact same elements, we must maintain that $\ve{a} \neq \ve{a}^T$ because they have different shapes. We also have different names for them. We call a vector like $\ve{a}$ a "column vector" because it looks like a column of a matrix, and we call $\ve{a}^T$ a "row vector" because it looks like a row of a matrix. We can add two column vectors together to get a column vector. Similarly, we can add two rows vectors together to get a new row vector. However, we cannot add a column vector to a row vector because they have different shapes. In the next section, though, we will learn about an operation that can be performed between a row vector and a column vector.

When we apply two transpositions in succession, we get the same matrix back. In other words, for any matrix $A$, \begin{align*} (A^T)^T = A. \end{align*} This is because transposition involves exchanging rows with columns. If we do this two times in a row, the $i$th column becomes the $i$th row after the first transposition and then becomes the $i$th column again after the second.

One interesting thing to note is that, for a square matrix, the transposition leaves the main diagonal unchanged and exchanges the triangle below the diagonal the triangle above it. If the matrix is symmetric around the main diagonal, then it remains unchanged under transposition. Such a matrix is called symmetric matrix. Formally, it is a matrix $A$ such that \begin{align*} A^T = A. \end{align*} An example of a symmmetric is the identity matrix $I$. This is because \begin{align*} I^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I. \end{align*} The scaling matrix $S_{a,b}$ is also symmetric. \begin{align*} S_{a,b}^T = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}^T = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix} = S_{a,b}. \end{align*} The rotation matrix $R_{\theta}$ is generally not symmetric. This is because \begin{align*} R_{\theta} &= \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}, \\ R_{\theta}^T &= \begin{bmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{bmatrix}. \end{align*} We can see that $R_{\theta} = R_{\theta}^T$ if only if $\sin \theta = -\sin \theta$. This only happens if $\sin \theta = 0$, which means that $\theta = 2\pi k$ for some integer $k$. When this happens, $\cos \theta = 1$, so $R_{\theta} = I$, which is indeed symmetric. While $R_{\theta}^T \neq R_\theta$ in general, it is interesting to note that, because $\sin(-\theta) = -\sin \theta$ and $\cos(-\theta) = \cos\theta$, we have that \begin{align*} R_{\theta}^T &= \begin{bmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{bmatrix} = \begin{bmatrix} \cos (-\theta) & -\sin (-\theta) \\ \sin (-\theta) & \cos (-\theta) \end{bmatrix} = R_{-\theta}. \end{align*} So, the matrices $R_{\theta}$, representing the rotation by the angle $\theta$ in the counter-clockwise direction, is closely related to the matrix $R_{-\theta}$, which represents the rotation also by the $\theta$ but in the clockwise direction.

15.7.2   Matrix-vector multiplication

The operation that we shall discuss in this section is the core operation that captures the essense of a matrix. A matrix-vector mutliplication is only defined for a matrix and a vector of certain sizes. That is, if the matrix of size $m \times n$, then the vector must have $n$ dimensions. In other words, the number of columns much match the number of dimensions of the vectors. Given a $m \times n$ matrix \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \cdots & a_{2n} \\ a_{31} & a_{32} & a_{33} & \cdots & a_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix}, \end{align*} and a $n$-dimensional vector \begin{align*} \ve{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n \end{bmatrix}, \end{align*} the matrix-vector product between $A$ and $\ve{x}$, denoted by $A\ve{x}$, is defined to be the $m$-dimensional vector \begin{align*} A\ve{x} &= x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ a_{31} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ a_{32} \\ \vdots \\ a_{m2} \end{bmatrix} + x_3 \begin{bmatrix} a_{13} \\ a_{23} \\ a_{33} \\ \vdots \\ a_{m3} \end{bmatrix} + \dotsb + x_n \begin{bmatrix} a_{1n} \\ a_{2n} \\ a_{3n} \\ \vdots \\ a_{mn} \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 + \dotsb + a_{1n}x_n\\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 + \dotsb + a_{2n}x_n\\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 + \dotsb + a_{3n}x_n\\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 + \dotsb + a_{mn}x_n \end{bmatrix}. \end{align*} Again, we take the the $n$ numbers in $\ve{x}$ and multiply them with the $n$ corresponding columns of $A$. Then, we add the multipled columns together to form a single $m$-dimensional vector. In fact, the matrix-vector product is exactly the same $A\ve{x}$ as the value $f(\ve{x})$ where $f$ is the linear transformation represented by $A$. Because a matrix is a representation of a linear transformation, and matrix-vector multiplication captures the effect of the represented linear transformation on the vector, this operation is what a linear transformation is all about. A linear transformation is a multiplcation of a vector by a matrix. Symbolically. for any linear transformation $f:\Real^n \ra \Real^m$ and for any $\ve{x} \in \Real^n$, we have that \begin{align*} f(\ve{x}) = A\ve{x} \end{align*} for some matrix $A$ of size $m \times n$. The last section tells us how to find what the matrix $A$ is based on the behavior of $f$ on $n$ fixed vectors $\hat{\ve{e}}_1$, $\hat{\ve{e}}_2$, $\dotsc$, $\hat{\ve{e}}_n$.

Let us see some examples of matrix-vector multiplication calculations. Consider the matrix \begin{align*} \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}. \end{align*} Let us multiply it with a 2D vector $(3, -1)$. We have that \begin{align*} \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 3 \\ -1 \end{bmatrix} &= 3 \begin{bmatrix} 1 \\ 0 \end{bmatrix} + (-1) \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix} + \begin{bmatrix} -2 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \end{align*} If the vector is $(-2, 3)$, the result is \begin{align*} \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} -2 \\ 3 \end{bmatrix} &= (-2) \begin{bmatrix} 1 \\ 0 \end{bmatrix} + 3 \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} -2 \\ 0 \end{bmatrix} + \begin{bmatrix} 6 \\ 3 \end{bmatrix} = \begin{bmatrix} 4 \\ 3 \end{bmatrix}. \end{align*} If we multiply the matrix with $\hat{\ve{e}}_1 = (1,0)$ and $\hat{\ve{e}}_2 = (0,1)$, we have \begin{align*} \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \hat{\ve{e}}_1 = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} &= 1 \begin{bmatrix} 1 \\ 0 \end{bmatrix} + 0 \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \end{align*} and \begin{align*} \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \hat{\ve{e}}_2 = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} &= 0 \begin{bmatrix} 1 \\ 0 \end{bmatrix} + 1 \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}. \end{align*} So, from the two calculations above, we can easily infer that \begin{align*} A \hat{\ve{e}}_i = \mbox{the }i\mbox{th column of A}, \end{align*} given that $i$ is betwen 1 and the number of columns $A$.

15.7.3   Dot product as matrix-vector multiplication and vice versa

Interestingly, we can get a new insight onto matrix-vector multiplication by considering the special case of multiplying a row vector with a column vector. Recall again that a row vector $\ve{a}$ can be seen as an $n \times 1$ matrix: \begin{align*} \ve{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix} \end{align*} As a result, its transpose $\ve{a}^T$ is a $1 \times n$ matrix. \begin{align*} \ve{a}^T = \begin{bmatrix} a_1 & a_2 & a_3 & \cdots & a_n \end{bmatrix}, \end{align*} and we call it a row vector. Given another $n$-dimensional column vector $\ve{b}$, \begin{align*} \ve{b} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ \vdots \\ b_n \end{bmatrix}, \end{align*} we can certainly multiply $\ve{a}^T$, a $1 \times n$ matrix, to $\ve{b}$, an $n \times 1$ matrix. The result would be a $1 \times 1$ matrix below: \begin{align*} \ve{a}^T \ve{b} &= \begin{bmatrix} a_1 & a_2 & a_3 & \cdots & a_n \end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ \vdots \\ b_n \end{bmatrix} \\ &= b_1 \begin{bmatrix} a_1 \end{bmatrix} + b_2 \begin{bmatrix} a_2 \end{bmatrix} + \dotsb + b_n \begin{bmatrix} a_n \end{bmatrix} \\ &= \begin{bmatrix} a_1 b_1 + a_2 b_2 + \dotsb + a_n b_n \end{bmatrix}. \end{align*} A $1 \times 1$ matrix is just a table with one number; in other words, a scalar. Mathematicians thus prefers it to just treat just like a scalar. As a result, they simply remove the parentheses on the RHS and treat the result like a number. In other words, \begin{align*} \ve{a}^T \ve{b} = a_1 b_1 + a_2 b_2 + \dotsb + a_n b_n = \sum_{i=1}^n a_i b_i. \end{align*} So, multiplying a row vector with a column vector of the same size results in a scalar, and the process involves multiplying the corresponding entries together and adding up the products.

Note that, in Section 4.2.3, we learned about an operation called the "dot product" that is defined betwen two 3D vectors. In particular, given two vectors in $\ve{a} = (a_1, a_2, a_3)$ and $\ve{b} = (b_1, b_2, b_3)$, we have that \begin{align*} \ve{a} \cdot \ve{b} = a_1 b_1 + a_2 b_2 + a_3 b_3 = \sum_{i=1}^3 a_i b_i. \end{align*} In general, we can define the dot product between $n$-dimensional in a similar fashion: given $\ve{a} = (a_1, a_2, \dotsc, a_n)$ and $\ve{b}_n = (b_1, b_2, \dotsc, b_n)$, then we may define their dot product as \begin{align*} \ve{a} \cdot \ve{b} = a_1 b_1 + a_2 b_2 + \dotsb + a_n b_n = \sum_{i=1}^n a_i b_i. \end{align*} Obviously, \begin{align*} \ve{a} \cdot \ve{b} = \ve{a}^T \ve{b}, \end{align*} so the dot product between two (column) vectors can be seen as a matrix-vector multiplication between a row vector, obtained by transposing a vector, with the other vector.

Interestingly, the dot product can help us understand the formula for matrix-vector multiplication in a new light: it is a series of dot products! To see this, we need to interpret the matrix $A$ as a collection of $m$ rows. \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix} &= \begin{bmatrix} \ve{a}_1^T \\ \ve{a}_2^T \\ \vdots \\ \ve{a}_m^T \end{bmatrix}. \end{align*} Here, $\ve{a}_1$, $\ve{a}_2$, $\dotsc$, $\ve{a}_m$ are $n$-dimensional column vectors such that $\ve{a}_i^T$ is equal to the $i$th row of $A$. In other words, \begin{align*} \ve{a}_1^T &= \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \end{bmatrix} \\ \ve{a}_2^T &= \begin{bmatrix} a_{21} & a_{22} & a_{23} & \cdots & a_{2n} \end{bmatrix} \\ & \qquad \qquad \qquad \vdots \\ \ve{a}_m^T &= \begin{bmatrix} a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix}. \end{align*} Now, recall the formula for matrix-vector multiplication \begin{align*} A\ve{x} &= \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 + \dotsb + a_{1n}x_n\\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 + \dotsb + a_{2n}x_n\\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 + \dotsb + a_{3n}x_n\\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 + \dotsb + a_{mn}x_n \end{bmatrix}. \end{align*} Upon close inspect, we see that \begin{align*} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 + \dotsb + a_{1n}x_n &= \ve{a}_1^T \ve{x}, \\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 + \dotsb + a_{2n}x_n &= \ve{a}_1^T \ve{x}, \\ & \vdots \\ a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 + \dotsb + a_{mn}x_n &= \ve{a}_m^T \ve{x}. \end{align*} So, we may write \begin{align*} A\ve{x} &= \begin{bmatrix} \ve{a}_1^T \\ \ve{a}_2^T \\ \vdots \\ \ve{a}_m^T \end{bmatrix} \ve{x} = \begin{bmatrix} \ve{a}_1^T \ve{x} \\ \ve{a}_2^T \ve{x}\\ \vdots \\ \ve{a}_m^T \ve{x} \end{bmatrix}. \end{align*} In other words, we can see a matrix-vector multiplication as a series of dot products. We treat the rows of $A$ as row vectors. When we multiply $A$ with a vector $\ve{x}$, we compute the dot product of the first row with $\ve{x}$ and put the resulting number as the first entry of the resulting vector. Then, we do the same for the second row, the third row, the fourth row, and so on.

15.7.4   Matrix-matrix multiplcation

Matrix-matrix multiplication is an operation that takes in two matrices and produces a new matrix as a result. We can multiply a matrix $A$ with matrix $B$ only if the number of columns of $A$ is equal the number of rows of $B$. In other words, it must be the case that $A \in \Real^{m \times n}$ and $B \in \Real^{n \times r}$ for some integers $n$, $m$, and $r$. The product between $A$ and $B$ is denoted by $AB$, and it is matrix of dimension $m \times r$. (In other words, the middle number $n$ disappears.)

Let us talk about thes standard definition of matrix multiplication first. Let $C = AB$. Moreover, $a_{ij}$, $b_{jk}$, and $c_{jk}$ denote the entries of $A$, $B$, and $C$ respectively. In other words, when we write $C = AB$, we mean that \begin{align*} \begin{bmatrix} c_{11} & c_{12} & \cdots & c_{1r} \\ c_{21} & c_{22} & \cdots & c_{2r} \\ \vdots & \vdots & \ddots & \vdots \\ c_{m1} & c_{m2} & \cdots & c_{mr} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1r} \\ b_{21} & b_{22} & \cdots & b_{2r} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nr} \end{bmatrix}. \end{align*} The standard way to define matrix-matrix multiplication to relate the entries of $C$ to the entries of $A$ and $B$, and it is done with the following equation: \begin{align*} c_{ik} = \sum_{j=1}^n a_{ij}b_{jk}. \end{align*}

Let us slow down and see in more details the computation that the equation prescribes. It says that the entry on the Row $i$ and Column $j$ of $C$ is a sum of products of $n$ pairs of numbers. For each pair, one number comes from matrix $A$, and the other comes from matrix $B$. The numbers that comes from $A$ are $a_{i1}$, $a_{i2}$, $a_{i3}$, $\dotsc$, $a_{in}$. Upon closer inspection, these are the numbers on Row $i$ of $A$. \begin{align*} \begin{bmatrix} \vdots & \vdots & \vdots & & \vdots \\ a_{i1} & a_{i2} & a_{i3} & \cdots & a_{in} \\ \vdots & \vdots & \vdots & & \vdots \end{bmatrix} \end{align*} The numbers that come from $B$ are $b_{1k}$, $b_{2k}$, $b_{3k}$, and $b_{nk}$, and these are the numbers of in Column $k$ of $B$. \begin{align*} \begin{bmatrix} \cdots & b_{1k} & \cdots \\ \cdots & b_{2k} & \cdots \\ \cdots & b_{3k} & \cdots \\ & \vdots & \\ \cdots & b_{nk} & \cdots \end{bmatrix}. \end{align*} Now that we have two lists of numbers, we multiple numbers at the same positions together to get another list of $n$ numbers: \begin{align*} a_{i1} b_{1k},\quad a_{i2} b_{2k},\quad a_{i3} b_{3k},\quad \dotsc,\quad a_{in} b_{nk}. \end{align*} Then, we add everything in the above list together to get $c_{ik}$ \begin{align*} c_{ik} = a_{i1} b_{1k} + a_{i2} b_{2k} + a_{i3} b_{3k} + \dotsb + a_{in} b_{nk} = \sum_{j=1}^n a_{ij}b_{jk}. \end{align*}

Applying the knowledge from the last section, we can also say that $c_{ij}$ is equal to the dot product of Row $i$ of $A$ and Column $k$ of $B$. \begin{align*} c_{ij} = \begin{bmatrix} a_{i1} & a_{i2} & a_{i3} & \cdots & a_{in} \end{bmatrix} \begin{bmatrix} b_{1k} \\ b_{2k} \\ b_{3k} \\ \vdots \\ b_{nk} \end{bmatrix} \end{align*} So, if we write $A$ is a vertical stack of row vectors like in the last section \begin{align*} A = \begin{bmatrix} \ve{a}_1^T \\ \ve{a}_2^T \\ \vdots \\ \ve{a}_m^T \end{bmatrix}, \end{align*} and we write $B$ as a horizontal stack of column vectors \begin{align*} B = \begin{bmatrix} \ve{b}_1 & \ve{b}_2 & \cdots & \ve{b}_r \end{bmatrix}, \end{align*} then there is another way to write an equation for $c_{ij}:$ \begin{align*} c_{ik} = \ve{a}^T_i \ve{b}_k. \end{align*} This gives us another way to write down the definition of matrix-matrix multiplication. \begin{align*} AB = \begin{bmatrix} \ve{a}_1^T \\ \ve{a}_2^T \\ \vdots \\ \ve{a}_m^T \end{bmatrix} \begin{bmatrix} \ve{b}_1 & \ve{b}_2 & \cdots & \ve{b}_r \end{bmatrix} &= \begin{bmatrix} \ve{a}^T_1 \ve{b}_1 & \ve{a}^T_1 \ve{b}_2 & \cdots & \ve{a}^T_1 \ve{b}_r \\ \ve{a}^T_2 \ve{b}_1 & \ve{a}^T_2 \ve{b}_2 & \cdots & \ve{a}^T_2 \ve{b}_r \\ \vdots & \vdots & \ddots & \vdots \\ \ve{a}^T_m \ve{b}_1 & \ve{a}^T_m \ve{b}_2 & \cdots & \ve{a}^T_m \ve{b}_r \end{bmatrix}. \end{align*} I think this formula is a lot more memorizable than the equation that involves $c_{ij}$.

Let us also talk about another way to view matrix-matrix multiplication that is less discussed than the standard defintion. Looking at the RHS of the last equation. We see that the columns of $AB$ are \begin{align*} \begin{bmatrix} \ve{a}^T_1 \ve{b}_1 \\ \ve{a}^T_2 \ve{b}_1 \\ \vdots \\ \ve{a}^T_m \ve{b}_1 \end{bmatrix}, \quad \begin{bmatrix} \ve{a}^T_1 \ve{b}_2 \\ \ve{a}^T_2 \ve{b}_2 \\ \vdots \\ \ve{a}^T_m \ve{b}_2 \end{bmatrix}, \quad \begin{bmatrix} \ve{a}^T_1 \ve{b}_3 \\ \ve{a}^T_2 \ve{b}_3 \\ \vdots \\ \ve{a}^T_m \ve{b}_3 \end{bmatrix}, \quad \dotsc, \quad \begin{bmatrix} \ve{a}^T_1 \ve{b}_r \\ \ve{a}^T_2 \ve{b}_r \\ \vdots \\ \ve{a}^T_m \ve{b}_r \end{bmatrix}. \end{align*} We also learned in the last section that these are equal to \begin{align*} A\ve{b}_1, \quad A\ve{b}_2, \quad A\ve{b}_3, \quad \dotsc, \quad A\ve{b}_r. \end{align*} Hence, \begin{align*} A B = A \begin{bmatrix} \ve{b}_1 & \ve{b}_2 & \cdots & \ve{b}_r \end{bmatrix} = \begin{bmatrix} A \ve{b}_1 & A \ve{b}_2 & \cdots & A \ve{b}_r \end{bmatrix}. \end{align*} In other words, $AB$ is matrix with the same number of columns as $B$, and the Column $k$ of $AB$ is the result of multiplying $A$ with Column $k$ of $B$. Writing matrix-matrix multiplication this ways allows us to see that it is a generalization of matrix-vector multiplication. To repeat, matrix-matrix multiplication is just muliplying the matrix on the left to the columns of the matrix on the right and then stitching the results together.

15.6.5   Other matrix operations

15.7   Affine transformations

15.8   Two equivalent views of affine transformations

15.11   Summary


<< Contents