\( \def\sc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \newcommand{\dee}{\mathrm{d}} \newcommand{\Dee}{\mathrm{D}} \newcommand{\In}{\mathrm{in}} \newcommand{\Out}{\mathrm{out}} \newcommand{\pdf}{\mathrm{pdf}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\ve}[1]{\mathbf{#1}} \newcommand{\mrm}[1]{\mathrm{#1}} \newcommand{\ves}[1]{\boldsymbol{#1}} \newcommand{\etal}{{et~al.}} \newcommand{\sphere}{\mathbb{S}^2} \newcommand{\modeint}{\mathcal{M}} \newcommand{\azimint}{\mathcal{N}} \newcommand{\ra}{\rightarrow} \newcommand{\mcal}[1]{\mathcal{#1}} \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} \newcommand{\Z}{\mathcal{Z}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\z}{\mathbf{z}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\sgn}{\mathrm{sgn}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\Real}{\mathbb{R}} \newcommand{\sseq}{\subseteq} \newcommand{\ov}[1]{\overline{#1}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \)

15   2D Transformations


So far, we have learned how to write GLSL shaders, and we have introduced abstractions to make working with them easier. What we can draw so far with OpenGL are images that can be computed in the fragment shader (Chapter 13) and colored meshes (Chapter 14 and many previous chapters).

In this chapter, we introduce a new concept: transformations. A transformation is a mathematical operation that we can apply to the geometric data (points, lines, triangles, camera positions, and so on) that make up our scene so that such data change their positions and shapes. For example, given a mesh, we can apply a transformation to change its position in the scene, rotate it to face other direction, or scale it so that it becomes 2 times or 3 times bigger. We can use transformations to change how the scene is rendered; for examples, changing the zoom level or moving the final rendering around the canvas. Transformation is a fundamental concept in computer graphics, and mastering it is a requirement for a competent graphics practitioner.

Transformation, however, is quite a complicated topic in it self. In the end, we want to cover transformations on 3D objects because WebGL is all about rendering 3D scenes. Nevertheless, such 3D transformations are quite complex by themselves. Moreover, we have not really discussed how to deal with 3D data so far besides setting the $z$-coordinates to zero so as to not worry about the third dimension like in the last chapter. So, in this chapter, we shall limit ourselves to the discussions of 2D transformations. They are much easier to deal with because they are very easy to visualize with what we have learned so far, and there are fewer types of 2D transformations we need to discuss.

15.1   Definitions and examples

Mathematically, a 2D transformation is a function that consumes a 2D point and outputs a 2D point. More formally, we say that $f$ is a 2D transformation if its maps $\Real^2$ to $\Real^2$. In other words, its signature is $$f: \Real^2 \ra \Real^2.$$ Let us look at some examples. One of the simplest transformations is the identity transformation $\mathtt{I}$, which simply outputs whatever the input is: \begin{align*} \mathtt{I}( \ve{x} ) = \ve{x} \end{align*} for every $\ve{x} \in \Real^2$. So, \begin{align*} I((0,0)) &= (0,0), & I((1,2)) &= (1,2),& I((-5,3)) &= (-5, 3), \end{align*} and so on. (To simplify the notation, we will omit double parentheses and write $f(-5, 3)$ in place of $f((-5,3))$ from now on.)

Other transformations are more complicated to define as they involve some parameters to define their behavior. An example of transformations with parameters are the constant functions $\mathtt{C}_{\ve{a}}$ where $\ve{a} \in \Real^2$ is the parameter. The constant function always outputs $\ve{a}$ no matter what the input is: \begin{align*} \mathtt{C}_{\ve{a}}(\ve{x}) = \ve{a} \end{align*} for all $\ve{x} \in \Real^2$. So, \begin{align*} \mathtt{C}_{(2,1)}(10, 15) &= (2,1), & \mathtt{C}_{(-1,1)}(0, 0) &= (-1,1), & \mathtt{C}_{(46,49)}(39, 888) &= (46,49). \end{align*} (Again, when the parameter is a 2D point, we may drop the parantheses around it to make the notation simpler. That is, we may write $\mathtt{C}_{2,1}$ to mena $\mathtt{C}_{(2,1)}$.) Note that, while there is only one identity transformation, there are as many constants functions as there are points in $\Real^2$.

A type of transformation that is used extensively in computer graphics is the translation. Like a constant function, a translation $\mathtt{T}_{\ve{a}}$ is parametered by a point $\mathbf{a} \in \Real^2$. What it does is to add $\ve{a}$ to the input: \begin{align*} \mathtt{T}_{\ve{a}}(\ve{x}) = \ve{x} + \ve{a}. \end{align*} So, \begin{align*} \mathtt{T}_{1,1}(5,6) &= (6,7), & \mathtt{T}_{-2,3}(7,8) &= (5,11), & \mathtt{T}_{-10,-20}(100,200) &= (90,180). \end{align*}

Lastly, another type of transformation that is used extensively in computer graphics is scaling. It is also parameterized by a point $\ve{a} \in \Real^2$, but it is better to treat the points as an ordered pair $(a_1, a_2)$. The scaling $\mathtt{S}_{a_1,a_2}$ simply multiplies $a_1$ to the $x$ component of the input and $a_2$ to the $y$-component of the input. In other words, \begin{align*} \mathtt{S}_{a_1, a_2}\bigg( \begin{bmatrix} x \\ y \end{bmatrix} \bigg) = \begin{bmatrix} a_1 x \\ a_2 y \end{bmatrix} \end{align*} So, \begin{align*} \mathtt{S}_{2,2}(10, 9) &= (20, 18), & \mathtt{S}_{4,5}(-3, 2) &= (-12, 10), & \mathtt{S}_{20,10}(-1,-2) &= (-20, -20). \end{align*}

15.2   Why transformations matter

So far, we have introduced transformation as a mathematical concept. This, however, is very abstract and quite removed from the visual world of computer graphics. However, transformations are used in graphics because they we can apply them to graphics data and see the results with our eyes. In this section, we shall see how we can apply transformations to meshes why they are useful to graphics practioners.

Meshes are made of vertices, and all vertices must have one attribute: their positions. For a 2D mesh, the position is a point in $\Real^2$. If we apply a 2D transformation to it, we get another position. In other words, when we apply a transformation to a vertex, we move it to a new position. So, if we apply the same transformation to all vertices in a mesh, we can change both the mesh's positions and shape. Let's see how the transformations we have learned so far affect 2D meshes.

For the identity transformation $\mathtt{I}$, we know that it does not change any vertex's position at all, so any mesh would remain unchanged if we apply the identity transformation to it. For the constant transformation $\mathbb{C}_{a,b}$, all vertices are moved $(a,b)$, and so the mesh is reduced to a single point. So, these two transformations are not useful on their own.

The translation $\mathtt{T}_{a,b}$ is a little more interesting. We know that it adds the vector $(a,b)$ to all vertex positions. For example, let us say we have a square mesh made with 4 vertices at $(0, 0)$, $(1,0)$, $(1,1)$, and $(0,1)$. If we apply $T_{-1,2}$ to the mesh, then the vertex position becomes $(-1,2)$, $(0, 2)$, $(0, 3)$, and $(-1, 3)$, respectively. The overall effect is presented in the figure below.

4 3 2 1 0 -1 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{T}_{-1,2}}}$ 4 3 2 1 0 -1 -2 -1 0 1 2 3
Figure 14.1 The effect of applying a translation to a square mesh.

In general, the effect of applying $\mathtt{T}_{a,b}$ to a mesh is to move the mesh to the right by $a$ units and then to upward by $b$ units. Note that $a$ and $b$ are signed. So, moving to the right by a negative number of units means moving to the left, and moving upward by a negative number of units means moving downward.

Next, let's look at scaling $\mathtt{S}_{a,b}$. When we apply it to the square mesh above, the vertices become $(0,0)$, $(a,0)$, $(a,b)$, and $(0,b)$. What the results looks like depends on the sign of $a$ and $b$. When $a$ and $b$ are both positive, it scales the mesh up $a$ times in the $x$-direction and $b$-times in the $y$-direction. Note that $a$ and $b$ can be less than $1$, which means that it can also shrinks (i.e., scales down) the mesh too.

4 3 2 1 0 -1 -1 0 1 2 3 4 ${\huge \xrightarrow{\mathtt{S}_{3,2}}}$ 4 3 2 1 0 -1 -1 0 1 2 3 4
4 3 2 1 0 -1 -1 0 1 2 3 4 ${\huge \xrightarrow{\mathtt{S}_{0.5,0.5}}}$ 4 3 2 1 0 -1 -1 0 1 2 3 4
Figure 14.2 The effects of scaling with positive scaling factors.

When $a$ is negative, the mesh "flips" horizontally. Any point to the right of the vertical line $x = 0$ is moved to the right side, and any point to the right would move to the left. The coordinate is scaled by a factor of $|a|$. Similarly, when $b$ is negative, the mesh "flips" vertically. Points above the horizontal line $y = 0$ go under, and points under the line go above.

3 2 1 0 1 -2 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{S}_{-1,-1}}}$ 3 2 1 0 1 -2 -2 -1 0 1 2 3
3 2 1 0 1 -2 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{S}_{-1,2}}}$ 3 2 1 0 1 -2 -2 -1 0 1 2 3
3 2 1 0 1 -2 -2 -1 0 1 2 3 ${\huge \xrightarrow{\mathtt{S}_{3,-2}}}$ 3 2 1 0 1 -2 -2 -1 0 1 2 3
Figure 14.3 The effects of scaling with negative scaling factors.

One thing to nice is that, for scaling, there is a special point: the origin $(0,0)$. The origin is never changed by a scaling because multipying $0$ with any number always result in $0$. Other points are moved according to the scaling factors $a$ and $b$. It is as though "scape" itself is warped around $(0,0)$. So, we may call $(0,0)$ the center of scaling. To repeat, it is very important to remember that scaling always happen around a point. On the other hand, there is no "center of translation" because a translation move all points in the same way.

From the examples we have seen so far, we can see why transformations are useful to computer graphics. Using translation, we can move a mesh around the scene. Using scaling, we can make a mesh larger or smaller to our liking. Using these two transformations togeter allows us to resize a mesh and place it to where we like in a scene.

15.3   Transformation arithematic

Like many other mathematical objects, we can perform operations such as addition, subtraction, and multiplication on transformations to another transformation. This allows us to build more complex transformations from simpler transformations, and it makes transformation another kind of "numbers," much like vectors. In this section, we will learn about three operations on transformations that are used widely in computer graphics: composition, addition, and scalar multiplication.

15.3.1   Transformation Equality

Let us talk, though, about how to decide whether two transformations are the same or not first. This will help us make more sense of the operations we are about to learn. We say that two transformations $f$ and $g$ are equal if the results of applying $f$ and $g$ to 2D any point $\ve{x}$ are the same. In mathematcal notations:

$f = g$ if $f(\ve{x}) = g(\ve{x})$ for all $\ve{x} \in \Real^2$.

In other words, $f$ and $g$ have the exact same effect on every point. We can swap $g$ in place of $f$, apply the function, and no one would notice any differences in the result. If there is a point $\ve{x}$ where $f(\ve{x}) \neq g(\ve{x})$, then $f$ and $g$ are not equal, and we would write $f \neq g$.

Let us observe equality between transformations we have seen so far. The identity transformation $\mathtt{I}$ returns the input $\ve{x}$ exactly without change. So does the translation by $(0,0)$, $\mathbb{T}_{0,0}$, and the scaling by $(1,1)$, $\mathtt{s}_{1,1}$. As a result, $$ \mathtt{I} = \mathtt{T}_{0,0} = \mathtt{S}_{1,1}.$$

The case above is the only case where a translation can be equal to a scaling. If $(a,b) \neq (0,0)$, them $\mathtt{T}_{a,b} \neq \mathtt{S}_{c,d}$ for any $(c, d) \in \Real^2$. This is because there is a point, $(0,0)$, where the effects of the two transformations differ: \begin{align*} T_{a,b}(0,0) = (a,b) \neq (0,0) = S_{c,d}(0,0). \end{align*}

15.3.2   Transformation Composition

With the discussion on equality out of the way, let us talk about the first operation on transformations. Given a 2D transformation $f: \Real^2 \ra \Real^2$ and another 2D transformation $g: \Real^2 \ra \Real^2$, we can create another transformation $h$ by just applying $f$ and then $g$. So, the defintion of $h$ is given by the equation $$h(\ve{x}) = g(f(\ve{x}))$$ for all $\ve{x} \in \Real^2$. Here, $h$ is called the composition of $f$ and $g$ and is denoted by $g \circ f$. As a result, we can write: $$(g \circ f)(\ve{x}) = g(f(\ve{x})).$$

Let us see an example. Let $f = \mathtt{T}_{1,0}$ and $g = \mathtt{S}_{2,2}$. We have that \begin{align*} (g \circ f)(x,y) &= g(f(x,y)) = \mathtt{S}_{2,2}(\mathtt{T}_{1,0}(x,y)) = \mathtt{S}_{2,2}(x+1, y) = (2x+2, 2y), \\ (f \circ g)(x,y) &= f(g(x,y)) = \mathtt{T}_{1,0}(\mathtt{S}_{2,2}(x,y)) = \mathtt{T}_{1,0}(2x, 2y) = (2x+1, 2y). \end{align*} It should be clear that, in this case, $f \circ g \neq g \circ f$ because their effects are different on all points. In generally, scaling then translation is not the same as translation then scaling. Composition is not commutative. The order at which we apply transformations matters.

On the other hand, if the order of transformations is fixed, it does not matter how we group the transformations in this order. More concretely, suppose we have three transformation $f$, $g$, and $h$ that we will apply to a point $\ve{x}$ in that order. The result is always $$h(g(f(\ve{x}))).$$ It does not matter if we first compose $f$ and $g$ to get $g\circ f$ and then compose it with $h$ before applying the result. In other words, \begin{align*} (h \circ (g \circ f))(\ve{x}) = h((g\circ f)\ve{x}) = h(g(f(\ve{x}))). \end{align*} It also does not matter if we compose $g$ and $h$ to get $h\circ g$ and then compose it with $f$ later: \begin{align*} ((h \circ g) \circ f)(\ve{x}) = (h \circ g)(f(\ve{x})) = h(g(f(\ve{x}))). \end{align*} This is because, in the end, all the transformations are applied in the same order. Mathematically, this fact is captured by the associative property: \begin{align*} (h \circ g) \circ f = h \circ (g \circ f) \end{align*} for any transformations $f$, $g$, and $h$. As a result, when we a long chain of transformations such as $f_5 \circ f_4 \circ f_3 \circ f_2 \circ f_1$, there is no need to write any parentheses between them because any parenthesization would yield the same transformation nonetheless. To repeat, composition is associative. As long as the order does not change, different groupings of transformations within the order do not produce different results.

While composition is in general not commutative, there are special cases where they are. The first special case is that the identity transformation commutes with any transformation: for any $f \in \Real^2 \ra \Real^2$, we have that \begin{align*} \mathtt{I} \circ f = f \circ \mathtt{I}. \end{align*} The second special case is that two translations commute with each other: \begin{align*} \mathtt{T}_{a,b} \circ \mathtt{T}_{c,d} = \mathtt{T}_{a+c, b+d} = \mathtt{T}_{c,d} \circ \mathtt{T}_{a,b}. \end{align*} So do two scalings: \begin{align*} \mathtt{S}_{a,b} \circ \mathtt{S}_{c,d} = \mathtt{S}_{ac, bd} = \mathtt{S}_{c,d} \circ \mathtt{S}_{a,b}. \end{align*} Commutativity generally breaks when you start mixing different types of transformations together.

15.3.3   Transformation Addition

A 2D transformation produces a 2D vector, which can be added. We can take this operation on vectors and then "lift" it to become an operation on transforamtions. More precisely, let $f$ and $g$ be two 2D transformations, the sum of $f$ and $g$ is the transformation $h$ such that \begin{align*} h(\ve{x}) = f(\ve{x}) + g(\ve{x}). \end{align*} For convenience, we shall write the sum of $f$ and $g$ as $f + g$. This gives: \begin{align*} (f+g)(\ve{x}) = f(\ve{x}) + g(\ve{x}). \end{align*} Transformation addition inherits many properties from vector addition (and addition in general). It is commutative and associative.

For the composition operator, the identity transformation $\mathtt{I}$ is the magic transformation that, when composed with another transformation, would leave the other transformation unchanged. For addition, this is no longer the case. This is because \begin{align*} (\mathtt{I} + f)(\ve{x}) = \mathtt{I}(\ve{x}) + f(\ve{x}) = \ve{x} + f(\ve{x}). \end{align*} We have that, the RHS would not be equal to the LHS unless $\ve{x} = (0,0)$, and so $I+f \neq f$. The magic transformatioin in this case is the constant transformation $\mathtt{C}_{0,0}$: \begin{align*} (\mathtt{C}_{0,0} + f)(\ve{x}) = \mathtt{C}_{0,0}(\ve{x}) + f(\ve{x}) = (0,0) + f(\ve{x}) = f(\ve{x}). \end{align*}

Adding two scalings together results in a scaling where the parameters are added. \begin{align*} (\mathtt{S}_{a,b} + \mathtt{S}_{c,d})(x,y) &= \mathtt{S}_{a,b}(x,y) + \mathtt{S}_{c,d}(x,y) \\ &= (ax, by) + (cx, dy) \\ &= \big( (a+c)x, (b+d) y \big) \\ &= \mathtt{S}_{a+c,b+d}(x,y). \end{align*} So, \begin{align*} \mathtt{S}_{a,b} + \mathtt{S}_{c,d} = \mathtt{S}_{a+c,b+d} \end{align*}

However, adding two translations together do not produce the same result as above. \begin{align*} (\mathtt{T}_{a,b} + \mathtt{T}_{c,d})(x,y) &= \mathtt{T}_{a,b}(\ve{x}) + \mathtt{T}_{c,d}(x,y) \\ &= (x+a, y+b) + (x+c,y+d) \\ &= (2x+a+b, 2y+b+d) \\ &= (x,y) + \big((x,y) +(a+c,b+d)) \\ &= \mathtt{I}(x,y) + \mathtt{T}_{a+c,b+d}(x,y) \end{align*} So, \begin{align*} \mathtt{T}_{a,b} + \mathtt{T}_{c,d} = \mathtt{I} + \mathtt{T}_{a+c,b+d} \end{align*}

In fact, we can say that a translation is the identity transformation added to a constant transformation:

\begin{align*} \mathtt{T}_{a,b}(x,y) &= (x+a,y+b) \\ &= (x,y) + (a,b) \\ &= \mathtt{I}(x,y) + \mathtt{C}_{a,b}(x,y) \\ &= (\mathtt{I} + \mathtt{C}_{a,b})(x,y), \end{align*} which gives us \begin{align*} \mathtt{T}_{a,b} = \mathtt{I} + \mathtt{C}_{a,b}. \end{align*}

15.3.4   Transformation Scalar Multiplication

Similar to the previous section, we can take another vector operation, multiplication by a scalar, and define an operation on transformation with it. Let $f$ be a 2D transformation and $c \in \Real$ be a scalar. The product of $f$ and $c$ is a transformation $g$ such that \begin{align*} g(\ve{x}) = c f(\ve{x}) \end{align*} for all $\ve{x} \in \Real^2$. We write the product as $cf$. Note that the scalar multiplication is conducted to after we apply the transformation. The order is important here.

In fact, muliplication by scalar is actually the same as composition with a scaling: \begin{align*} (c f)(\ve{x}) = cf(\ve{x}) = \mathtt{S}_{c,c}(f(\ve{x})) = (\mathtt{S}_{c,c} \circ f)(\ve{x}), \end{align*} and so \begin{align*} cf = \mathtt{S}_{c,c} \circ f. \end{align*}

The reader might think it is pointless to introduce a new operation that can be defined in terms of previous ones. However, scalar multiplication gives us a a way to write expressions involving transformation more succinctly. A scaling $\mathtt{S}_{c,c}$ where the two scaling parameters are equal is called a uniform scaling. We can replace a composition with such a scaling (which we would write $\mathtt{S}_{c,c} \circ$) with just $c$.

Another convenience afforded by scalar multiplication is the definition of transformation subtraction, which we can defined as: \begin{align*} f - g = f + (-1)g. \end{align*}

Scalar multiplication inherits properties from the multiplication operators for real numbers. For one, the order we multiply the scalars do not matter: \begin{align*} c(df) = (cd)f = (dc)f = d(cf). \end{align*} When paired with transformation addition, we also have the distributive rules: \begin{align*} c(f + g) &= cf + cg \\ (c + d)f &= cf + df \end{align*} In fact, one may show that the distributive rules also apply to scaling transformations. \begin{align*} \mathtt{S}_{a,b} \circ (f + g) &= \mathtt{S}_{a,b} \circ f + \mathtt{S}_{a,b} \circ g \\ (\mathtt{S}_{a,b} + \mathtt{S}_{c,d}) \circ f &= \mathtt{S}_{a,b} \circ f + \mathtt{S}_{c,d} \circ f \end{align*}

15.4   Linear Transformations

We have studied some simple types of transformations. It is time to study another type of transformations that is more complicated than the one we have encountered before: linear transformation. Linear transformations are widely used in computer graphics because they encompass useful transformations and also because they can easily represented by data, not programs.

We say that a transformation (or a function) $f: \Real^2 \ra \Real^2$ is linear if the following conditions are true:

  1. $f(\ve{x} + \ve{y}) = f(\ve{x}) + f(\ve{y})$ for any $\ve{x}, \ve{y} \in \Real^2$, and
  2. $f(c\ve{x}) = cf(\ve{x})$ for all $\ve{x} \in \Real^2$ and $c \in \Real$.
Condition (1) says that the transformation $f$ "distributes" like how a multiplication with scalar interacts with addition. Condition (2) says that the function "commutes" with scalar multiplcation.

Let's classify the transformations we have seen so far whether they are linear or not. The constant function is generally not linear. If $(a,b) \neq (0,0)$ we have that \begin{align*} \mathtt{C}_{a,b}\big((x,y) + (0,0)\big) &= \mathtt{C}_{a,b}(x,y) \\ &= (a,b) \\ &\neq (a,b) + (a,b) \\ &= \mathtt{C}_{a,b}(x,y) + \mathtt{C}_{a,b}(0,0). \end{align*} The reader should check that $\mathtt{C}_{0,0}$ is linear.

Translation is also not linear if $(a,b) \neq (0,0)$. This is because \begin{align*} \mathtt{T}_{a,b}((x,y) + (0,0)) &= \mathtt{T}_{a,b}((x,y) + (0,0)) \\ &= (x,y) + (a,b) \\ &\neq (x,y) + (a,b) + (a,b) \\ &= \mathtt{T}_{a,b}(x,y) + \mathtt{T}_{a,b}(0,0) \end{align*} The only translation that is linear is $\mathtt{T}_{0,0} = \mathtt{I}$.

On the other hand, scaling is linear. This is because \begin{align*} \mathtt{S}_{a,b}((x_0, y_0) + (x_1, y_1)) &= \mathtt{S}_{a,b}(x_0 + x_1, y_0 + y_1) \\ &= (a(x_0 + x_1), b(y_0 + y_1)) \\ &= (ax_0 + ax_1, by_0 + by_1) \\ &= (ax_0, by_0) + (ax_1, by_1) \\ &= \mathtt{S}_{a,b}(x_0, y_0) + \mathtt{S}_{a,b}(x_1, y_1), \end{align*} and \begin{align*} \mathtt{S}_{a,b}(c(x, y)) &= \mathtt{S}_{a,b}(c(x, y)) \\ &= \mathtt{S}_{a,b}(cx, cy) \\ &= (acx, bcy) \\ &= c(ax,by) \\ &= c\,\mathtt{S}_{a,b}(x, y). \end{align*} Because $\mathtt{I} = \mathtt{S}_{1,1}$, we have that the identity transformation is linear as well.

One characteristic of a linear transformation is that it must preserve the point $(0,0)$. Let us say that $f$ is linear, then \begin{align*} f(0,0) = f((0,0) + (0,0)) = f(0,0) + f(0,0) = 2f(0,0), \end{align*} which meanes that $f(0,0) = (0,0)$. This is one of the reason why, if $(a,b) \neq (0,0)$, then $\mathtt{C}_{a,b}$ and $\mathtt{T}_{a,b}$ are not linear.

15.5   2D Rotation

In Section 15.1, we learned about two types of transformations that are important to computer graphics: translation and scaling. There is a third type: rotation. In this chapter, we shall learn about rotations in 2D, and we will study rotations in 3D later in Chapter XXX.

In 2D, a rotation has a parameter, the angle $\theta$, and is denoted by $\mathtt{R}_\theta$. Given a point $\mathbf{x}$, the action of $\mathtt{R}_\theta$ on $\ve{x}$ is as follows.

  1. Interpret $\ve{x}$ as a vector from point $(0,0)$ to $\ve{x}$.
  2. Rotate the vector by angle $\theta$ in the counterclockwise direction.

Let us take the above description step by step. Let us consider a rotation by $\theta = 90^\circ$, and let us apply it to a point, say $(2,2)$. The first thing to do is to visualize $(2,1)$ on the 2D plane.

3 2 1 0 1 -2 -2 -1 0 1 2 3
Figure 14.4 The point $(2,1)$ visualized on the 2D plane.

Then, we should interpret the point as a vector from $(0,0)$, the origin. This means we draw an arrow from $(0,0)$ to $(1,2)$ and forget, for the moment, that the point ever exists.

3 2 1 0 1 -2 -2 -1 0 1 2 3
Figure 14.5 The point $(2,1)$ interpreted as a vector.

Next, we rotate the vector by $90^\circ$ in the counterclockwise rotation, using the point $(0,0)$ as the center of rotation.

3 2 1 0 1 -2 -2 -1 0 1 2 3 90°
Figure 14.6 The vector $(2,1)$ being rotated by $90^\circ$.

Lastly, we interpret the rotated vector as a point. The result is $(-1,2)$. In other words, $\mathtt{R}_{90^\circ}(2,1) = (-1,2).$

3 2 1 0 1 -2 -2 -1 0 1 2 3 90°
Figure 14.7 The rotated vector interpretted as a point.

Let us try to derive a formula for rotation. Say, we are given a point $(x,y)$, which we would like to rotate by the angle $\theta$. We know that we can always find a real number $r \geq 0$ and $\phi$ such that \begin{align*} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} r \cos \phi \\ r \sin \phi \end{bmatrix}. \end{align*} This is basically writing $(x,y)$ in terms of polar coordinates (Chapter 4.5.1). In other words, we interpet the point as a vector of length $r$ that makes an anle $\phi$ with the positive $x$-axis. After we rotate this vector by the angle $\theta$, the vector length remains the same. (In other words, $r$ does not change.) However, the angle that the vector makes with the positive $x$-axis changes from $\phi$ to $\phi + \theta$. Symbolically, this means that \begin{align*} \mathtt{R}_{\theta} \left( \begin{bmatrix} r \cos \phi \\ r \sin \phi \end{bmatrix} \right) = \begin{bmatrix} r \cos (\phi + \theta) \\ r \sin (\phi + \theta) \end{bmatrix} \end{align*} Let us now recall the rules for sine and cosine addition from high school. \begin{align*} \cos(\phi + \theta) &= \cos \phi \cos\theta - \sin\phi \sin\theta \\ \sin(\phi + \theta) &= \sin \phi \cos\theta + \cos\phi \sin\theta. \end{align*} As a result, \begin{align*} \begin{bmatrix} r \cos (\phi + \theta) \\ r \sin (\phi + \theta) \end{bmatrix} &= \begin{bmatrix} r \cos \phi \cos\theta - r \sin\phi \sin\theta \\ r \sin \phi \cos\theta + r \cos\phi \sin\theta \end{bmatrix} \\ &= \begin{bmatrix} (r \cos \phi) \cos\theta - (r \sin\phi) \sin\theta \\ (r \sin \phi) \cos\theta + (r \cos\phi) \sin\theta \end{bmatrix} \\ &= \begin{bmatrix} x \cos\theta - y \sin\theta \\ y \cos\theta + x \sin\theta \end{bmatrix} \\ &= \begin{bmatrix} x \cos\theta - y \sin\theta \\ x \sin\theta + y \cos\theta \end{bmatrix}. \end{align*} In other words, \begin{align*} \mathtt{R}_\theta(x,y) = \begin{bmatrix} x \cos\theta - y \sin\theta \\ x \sin\theta + y \cos\theta \end{bmatrix}. \end{align*}

From the formula, we can deduce that a rotation is a linear transformation. This is because \begin{align*} \mathtt{R}_{\theta}((a,b) + (c,d)) &= \mathtt{R}_{\theta}(a+c, b+d) \\ &= \begin{bmatrix} (a+c) \cos\theta - (b+d) \sin\theta \\ (a+c) \sin\theta + (b+d) \cos\theta \end{bmatrix} \\ &= \begin{bmatrix} a \cos\theta - b \sin\theta \\ a \sin\theta + b \cos\theta \end{bmatrix} + \begin{bmatrix} c \cos\theta - d \sin\theta \\ c \sin\theta + d \cos\theta \end{bmatrix} \\ &= \mathtt{R}_{\theta}(a, b) + \mathtt{R}_{\theta}(c, d), \end{align*} and \begin{align*} \mathtt{R}_{\theta}\big(c\,(a,b)\big) &= \mathtt{R}_{\theta}(ca,cb) \\ &= \begin{bmatrix} ca \cos\theta - cb \sin\theta \\ ca \sin\theta + cb \cos\theta \end{bmatrix} \\ &= c \begin{bmatrix} a \cos\theta - b \sin\theta \\ a \sin\theta + b \cos\theta \end{bmatrix} \\ &= c\, \mathtt{R}_{\theta}(a,b). \end{align*}

Because a rotation, it is a linear transformation, we have that $\mathtt{R}_{\theta}(0,0)$. This makes sense because, after all, the point $(0,0)$ is the center of rotation, which will not due to the rotation. The rotation by $0^\circ$ leaves all points unchanged because there is no rotation, and so \begin{align*} \mathtt{R}_{0^\circ} = \mathtt{I} = \mathtt{T}_{0,0}. \end{align*}

15.6   Matrices

We said earlier in Section 15.4, we said that linear transformations are widely used in computer graphics because they can be represented by data. In this section, we shall study matrices (singular: matrix), the mathematical objects and the data type that can be used to repreesent linear transformations.

Earlier in Section 4.1.1, we learned that all 2D vectors can be written in terms of two special vectors $\hat{\ve{x}} = (1,0)$ and $\hat{\ve{y}} = (0,1)$. More, specifically, we have that \begin{align*} (x,y) = (x, 0) + (0,y) = x\,(1,0) + y\,(0,1) = x\hat(\ve{x}) + y \hat(\ve{y}). \end{align*}

Now, let $f$ be a linear transformation, we have that \begin{align*} f(x,y) = f(x\hat{\ve{x}} + y \hat{\ve{y}}) = x\,f(\hat{\ve{x}}) + y\,f(\hat{\ve{y}}) = x\,f(1,0) + y\,f(0,1). \end{align*} The above equation is kind of remarkable. It says that, if we know the effect of a linear transformations $f$ on just two vectors—$\hat{\ve{x}} = (1,0)$ and $\hat{\ve{y}} = (0,1)$—then we know what it does to all the other vectors. If we are given $(x,y)$, we just multiple $x$ to $f(1,0)$ and $y$ to $f(0,1)$ and then add the results together.

This means that we can represent any linear transformations $f$ by $4$ numbers. This is because $f(1,0)$ is a 2D vector, which can be represented by 2 numbers, and so is $f(0,1)$. Let us give names to the components of these vectors. \begin{align*} f(1,0) &= \begin{bmatrix} a \\ b \end{bmatrix}, & f(0,1) &= \begin{bmatrix} c \\ d \end{bmatrix}. \end{align*} The conventional way to write down a representation of $f$ is to take the vectors $f(1,0)$ and $f(0,1)$ side by side \begin{align*} \begin{bmatrix} a \\ b \end{bmatrix} \begin{bmatrix} c \\ d \end{bmatrix} \end{align*} and then remove the borders between them to get a table of numbers \begin{align*} \begin{bmatrix} a & c \\ b & d \end{bmatrix}. \end{align*} This is called a matrix.

15.6.1   Definitions and Notations

Formally, a matrix is a two-dimensional table of numbers. We write down the table and surround them by square brackets. For example,

\begin{align*} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix}. \end{align*}

Each number in the matrix is called an element, so the matrices above has 12 elements. A matrix is divided vertically into rows

1 2 3 4 4 6 7 8 9 10 11 12

and horizontally into columns.

1 2 3 4 4 6 7 8 9 10 11 12

A matrix with $m$ rows and $n$ columns are said to be of size $m \times n$, so the matrix above is of size $3 \times $. We index the rows with natural numbers: the top row is Row 1, the second row from the top is Row 2, and so on. We index the columns similarly from left to right: the leftmost column is Column 1, the second leftmost column is Column 2, and so on.

1 2 3 4 4 6 7 8 9 10 11 12 1 2 3 1 2 3 4

We refer to an element in the matrix with the indices of its row and column, in that order. As a result, the top-left entry of the matrix above is Element $(1,1)$, and the bottom-right element is Element $(3,4)$. The Element $(2,3)$ is the element on Row 2 and Column 3, which is $7$. On the other hand, the element $(3,2)$ is the Element on Row 3 and Column 2, which is $10$.

We often use a upper-case letter such as $A$, $B$, and $C$ to represent a matrix, and we use the corresponding lower-case letter to represents its element.The symbol for the row element is subscripted with the indices of the row and the column, in that order. So, Element $(2,3)$ of matrix $A$ is denoted by $a_{23}$, and Element $(3,2)$ is denoted by $a_{32}$. So, if $A$ is a $3 \times 4$ matrix, then it can be written as follows. \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{33} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \end{bmatrix}. \end{align*} In case $A$ is of size $m \times n$, then \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ a_{21} & a_{22} & a_{33} & \cdots & a_{2n} \\ a_{31} & a_{32} & a_{33} & \cdots & a_{3n} \\ \vdots & \vdots &\vdots &\ddots & \vdots \\ a_{m1} & a_{m2} & a_{m3} & \cdots & a_{mn} \end{bmatrix}. \end{align*}

We often write 2D and 3D vectors using the "columm vector notation". \begin{align*} \ve{a} &= \begin{bmatrix} a_x \\ a_y \end{bmatrix}, & \ve{b} &= \begin{bmatrix} b_x \\ b_y \\ b_z \end{bmatrix}. \end{align*} Notice that a 2D vector looks just like a $2 \times 1$ matrix, and a 3D vector a $3 \times 1$ matrix. In fact, while they are not exactly the same thing, we can always interpret an $n$-dimensional vector as a $n \times 1$ matrix, and we shall do so when it is convenient to do so from now on.

All of the matrices we will deal with in this book, the entries are real numbers. The set of $m \times n$ matrices whose entries are real numbers are denoted by the symbol $\mathbb{R}^{m \times n}$.

15.6.2   Matrix representation of 2D transformations

As noted earlier briefly before the beginning of the last section, the matrix representation of a 2D linear transformation $f$ is obtained by the following process.

  1. Compute $f(1,0)$ and $f(0,1)$.
  2. Form a $2 \times 2$ matrix whose first column is the column vector $f(1,0)$ and whose second column is the column vector $f(0,1)$.

Let's see what the matrix representations of the 2D linear transformations we have learn so far look like. The simplest of these transformations is the identity transformation $\mathtt{I}$. Because \begin{align*} \mathtt{I}(1,0) &= \begin{bmatrix} 1 \\ 0 \end{bmatrix}, & \mathtt{I}(0,1) &= \begin{bmatrix} 0 \\ 1 \end{bmatrix},& \end{align*} the matrix representation of $\mathtt{I}$ is the matrix \begin{align*} I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \end{align*} The matrix $I$ has a special name: it is called the identity matrix after the transformation that it represents.

As we will see later, a matrix that represents a linear transformation in an $n$-dimension space will be of size $n \times n$. That is, there are as many rows as there are columns. Such a matrix is called a square matrix.

Next, let us look at the scaling transformation $\mathtt{S}_{a,b}$. Because \begin{align*} \mathtt{S}_{a,b}(1,0) &= \begin{bmatrix} a \\ 0 \end{bmatrix}, & \mathtt{S}_{a,b}(0,1) &= \begin{bmatrix} 0 \\ b \end{bmatrix},& \end{align*} the matrix representation of $\mathtt{S}_{a,b}$ is the matrix \begin{align*} S_{a,b} = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}. \end{align*}

The matrix representation for the rotation $\mathtt{R}_{\theta}$ is more complicated. Note that \begin{align*} \mathtt{R}_{\theta}(1,0) &= \begin{bmatrix} \cos \theta \\ \sin\theta \end{bmatrix}, & \mathtt{R}_{\theta}(0,1) &= \begin{bmatrix} -\sin\theta \\ \cos\theta \end{bmatrix}.& \end{align*} As a result, the matrix representation of $\mathtt{R}_\theta$ is \begin{align*} R_{\theta} = \begin{bmatrix} \cos \theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}. \end{align*} We can substitute angles whose sine and cosine values are well-known to see what the matrices of the rotations by these angles are. For examples, \begin{align*} R_{90^\circ} &= \begin{bmatrix} \cos 90^\circ & -\sin 90^\circ \\ \sin 90^\circ & \cos 90^\circ \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}, \\ R_{180^\circ} &= \begin{bmatrix} \cos 180^\circ & -\sin 180^\circ \\ \sin 180^\circ & \cos 180^\circ \end{bmatrix} = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}, \\ R_{270^\circ} &= \begin{bmatrix} \cos 270^\circ & -\sin 270^\circ \\ \sin 270^\circ & \cos 270^\circ \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \\ R_{45^\circ} &= \begin{bmatrix} \cos 45^\circ & -\sin 45^\circ \\ \sin 45^\circ & \cos 45^\circ \end{bmatrix} = \begin{bmatrix} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \end{bmatrix}, \\ R_{30^\circ} &= \begin{bmatrix} \cos 30^\circ & -\sin 30^\circ \\ \sin 30^\circ & \cos 30^\circ \end{bmatrix} = \begin{bmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix}, \\ R_{60^\circ} &= \begin{bmatrix} \cos 60^\circ & -\sin 60^\circ \\ \sin 60^\circ & \cos 60^\circ \end{bmatrix} = \begin{bmatrix} \frac{1}{2} & -\frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & \frac{1}{2} \end{bmatrix}. \end{align*}

We have seen that, given a linear transformation, we can write down the $2 \times 2$ matrix that represents it. On the other hand, if we are given a $2 \times 2$ matrix \begin{align*} A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \end{align*} then we can ask what is the linear transformation that this matrix represent. Let us call this transformation $f$. By reverse engineering the process of writing down the matrix representation of $f$, we know that the first column of $A$ gives us the value of $f(1,0)$. So, \begin{align*} f(1,0) = \begin{bmatrix} a_{11} \\ a_{21} \end{bmatrix}. \end{align*} Moreover, the second column gives us the value of $f(0,1):$ \begin{align*} f(0,1) = \begin{bmatrix} a_{12} \\ a_{22} \end{bmatrix}. \end{align*} Because $f$ is a linear transformation, we have that \begin{align*} f(x,y) = x\, f(1,0) + y\, f(0,1) = x\begin{bmatrix} a_{11} \\ a_{21} \end{bmatrix} + y \begin{bmatrix} a_{12} \\ a_{22} \end{bmatrix} = \begin{bmatrix} a_{11} x + a_{12}y \\ a_{21} x + a_{22}y \end{bmatrix} \end{align*} For example, consider the matrix \begin{align*} B = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}. \end{align*} The linear transformation it represents is \begin{align*} g(x,y) = \begin{bmatrix} x + 2y \\ 3x + 4y \end{bmatrix}. \end{align*} This transformation is neither a scaling nor a rotation though. It is just a random linear transformation.

15.6.3   Linear transformation in general

We just learned that a $2 \times 2$ matrix represents a 2D linear transformation. That is, it represents a function $f: \Real^2 \ra \Real^2$ that is linear. In other words, the function $f$ satisfies the following key properties.

  1. $f(\ve{x} + \ve{y}) = f(\ve{x}) + f(\ve{y})$ for all $\ve{x}, \ve{y} \in \Real^2$.
  2. $f(c\ve{x}) = c\,f(\ve{x})$ for all $c \in \Real$ and $\ve{x} \in \Real^2$.

In general, an $m \times n$ matrix represents a linear tranformation from $\Real^n$ to $\Real^m$. (Notice that the orders $n$ and $m$ swap.) In other words, it represents a function $f: \Real^n \ra \Real^m$ that is linear; i.e., satisfying the following properties.

  1. $f(\ve{x} + \ve{y}) = f(\ve{x}) + f(\ve{y})$ for all $\ve{x}, \ve{y} \in \Real^n$.
  2. $f(c\ve{x}) = c\,f(\ve{x})$ for all $c \in \Real$ and $\ve{x} \in \Real^n$.

15.6.3   Matrix-vector multiplication

A matrix is a mathematical object which naturally has a number of operations like addition, subtraction, multiplication, and so on that we can perform on it. In discussing operations on matrices, it is best to start with the operation that captures the essense of the matrix itself: matrix-vector multiplication.

15.6.4   Matrix-matrix multiplication

15.6.5   Other matrix operations

15.7   Affine transformations

15.8   Two equivalent views of affine transformations

15.9   Matrices in GLSL

15.10   Graphics math in Javascript

15.11   Summary


<< Contents