I would like to run my "Talking Head Anime from a Single Image" network on web browsers and mobile devices. My networks were trained with Pytorch. The current plan is to convert the model to Tensorflow.js (TFJS) and try to run it in web browsers. If this goes well, I can use the TFJS model to develop mobile applications with tools like Flutter or React Native. I can also develop desktop applications with Electron.
However, the main problem is that my network uses layers that are not yet implemented in TFJS. These are
affine_grid
,
grid_sample
, and
instance_norm
.
I think I cannot really rely on the TFJS team to implement them, so I will have to do it myself.
Implementing new layers needs understanding of how the underlying software works, and this notes is written as I try to gain this knowlege. As I am interested in implementing inference, not training, I will not cover how gradients are computed in TFJS.
The engine is the object that implements all TFJS functionality. It does
memory management for tensor objects
and execute mathematical computations. There seems to be only one engine present at a time, and it is accessible from the
global variable ENGINE
and also by calling tf.engine()
. Nevertheless, end users of TFJS rarely use it directly.
A backend is a part of the engine that carries two functions:
A backend is represented by the
KernelBackend
class.
It implements the TensorStorage
interface, which allows one to read, write, and manage memory for tensor data.
An operation (op) is a function that takes a number of tensors and produce a number of output tensors. It is an abstract operation that
can have different implementations in different backends. Examples of ops include
square
,
conv2d
,
and matMul
.
Ops are defined in the tfjs-core
package.
A kernel is a backend-specific implementation of an op. A kernel can be executed by calling the
runKernel
function of the engine.
To define a custom kernel, we first need to create a
KernelConfig
object,
where we must specify:
KernelFunc
interface.
A kernel is identified by a lookup key
that is made up of its name and its backend. TFJS maintains a global dictionary mapping keys to corresponding KernelConfig
, which it calls the
kernelRegistry
.
The registerKernel
function can be used to add new kernels to the registry.
Tensor data is represented by the
TensorData
generic class. It contains:
values
, which is a
BackendValues
,
which is in turn an array of Float32Array
, Int32Array
, or Uint8Array
.
dtype
, which has a type that extends
DataType
,
which is simply "float32"|"int32"|"bool"|"complex64"|"string"
.
complexTensorInfo
.refCount
.
The data strorage is implemented by the
DataStorage<TensorData<DataType>>
generic class,
which contains a WeakMap from
DataId
(arbitrary object) to TensorData<DataType>
.
The backend maintains in increasing integer
nextDataId
,
which is incremented every time a new data is written to the backend through the
write
method.
From BatchMatMul.ts
,
it seems that the tensor data is stored in row major order.
For 2D convolution operation, the input data can either be stored in the NCHW or NHWC layout. This can be specified as input to the
Conv2D op.
The filter weights, however, is stored in the [filterHeight, filterWidth, inDepth, outDepth]
[LINK].
Nevertheless, the transposed convolution operation can only supports input of the NHWC format [LINK].
It is, however, possible to invoke the kernel, called the
conv2DBackpropInput
, with the 'NCHW' format.
An example kernel implementation is given here. We will study it in more details.
KernelConfig
First, we need to construct an instance of
KernelConfig
,
whose type definition is given below:
/** Config object for registering a kernel in the global registry. */
export interface KernelConfig {
kernelName: string;
backendName: string;
kernelFunc: KernelFunc;
setupFunc?: KernelSetupFunc;
disposeFunc?: KernelDisposeFunc;
}
We see that we must provide (1) the kernel name, (2) the backend name (i.e., "cpu"
), and
(3) the
KernelFunc
.
The main bulk of work is in the KernelFunc
, whose type definition is:
/** Specifies the code to run when executing a kernel. */
export type KernelFunc = (params: {
inputs: NamedTensorInfoMap,
backend: {},
attrs?: NamedAttrMap,
}) => TensorInfo|TensorInfo[];
Let's go through the types that are involved with the KernelFunc
.
TensorInfo
.
/** Holds metadata for a given tensor. */
export interface TensorInfo {
dataId: DataId;
shape: number[];
dtype: DataType;
}
Recall that the dataId
field holds an arbitrary object. In the CPU backend, the object holds an integer ID of the tensor.
{
inputs: NamedTensorInfoMap,
backend: {},
attrs?: NamedAttrMap,
}
It has three fields:
inputs
is a collection of named tensors, which is represented by the
NamedTensorInfoMap
class.
export interface NamedTensorInfoMap {
[name: string]: TensorInfo;
}
Kernels will specialize this type based using the Pick
utility type. For example, the
SquareInputs
class, which represent tensor inputs to the Square kernel, is defined as follows:
export type SquareInputs = Pick<NamedTensorInfoMap, 'x'>;
backend
is the backend that is going to run the kernel.
attrs
represents a collection of named "attributes," where an attribute is just a non-tensor input. Its type is
NamedAttrMap.
Here's the collection of types that are used to represent attributes.
export interface NamedAttrMap {
[name: string]: Attribute;
}
/** These are extra non-tensor/primitive params passed to kernel functions. */
export type Attribute = AttributeValue|RecursiveArray;
type AttributeValue =
number|number[]|boolean|boolean[]|string|string[]|NamedAttrMap;
export interface RecursiveArray {
[index: number]: T|RecursiveArray;
}
Again, kernels will specialize this type as follows:
export interface FusedConv2DAttrs {
strides: [number, number]|number;
pad: 'valid'|'same'|number|ExplicitPadding;
dataFormat: 'NHWC'|'NCHW';
dilations: [number, number]|number;
dimRoundingMode: 'floor'|'round'|'ceil';
activation: Activation;
leakyreluAlpha?: number;
}
KernelFunc
Now, let's loook at the implementation of the Square kernel.
import {Square, SquareInputs} from '@tensorflow/tfjs-core';
import {KernelConfig} from '@tensorflow/tfjs-core';
import {MathBackendCPU} from '../backend_cpu';
import {assertNotComplex} from '../cpu_util';
export const squareConfig: KernelConfig = {
kernelName: Square,
backendName: 'cpu',
kernelFunc: ({inputs, backend}) => {
const {x} = inputs as SquareInputs;
const cpuBackend = backend as MathBackendCPU;
assertNotComplex(x, 'square');
const values = cpuBackend.data.get(x.dataId).values as Float32Array;
const newValues = new Float32Array(values.length);
for (let i = 0; i < values.length; ++i) {
const value = values[i];
newValues[i] = value * value;
}
const dataId = cpuBackend.write(newValues, x.shape, x.dtype);
return {dataId, shape: x.shape, dtype: x.dtype};
}
};
We see that there are a number of steps to follow.
inputs
field of the input object to the specific input type of the kernel and then extract the relevant individual inputs.
const {x} = inputs as SquareInputs;
Here, we see that x
is a TensorInfo
object that represents the input tensor.
backend
field to the specific backend that the kernel works with.
const cpuBackend = backend as MathBackendCPU;
const values = cpuBackend.data.get(x.dataId).values as Float32Array;
const newValues = new Float32Array(values.length);
for (let i = 0; i < values.length; ++i) {
const value = values[i];
newValues[i] = value * value;
}
const dataId = cpuBackend.write(newValues, x.shape, x.dtype);
Note that the write
function of all backends is supposed to return a new DataId
.
TensorInfo
interface and return it as output.
return {dataId, shape: x.shape, dtype: x.dtype};
After creating the KernelConfig
, we need to call the
registerKernel
function on it. For the CPU backend, all kernels are registered in
register_all_kernels.ts
.
The WebGL backend is implemented by the
MathBackendWebGL
class. It has a field called texData
, which has type DataStorage<TextureData>
, meaning that the
TextureData
class
represents storage of texture data. Here's the definition of the class:
export interface TextureData {
// Required.
shape: number[];
dtype: DataType;
// Optional.
values?: backend_util.BackendValues;
texture?: WebGLTexture;
// For complex numbers, the real and imaginary parts are stored as their own
// individual tensorInfos, with a parent joining the two with the
// complexTensors field. When this is defined, texture will be null.
complexTensorInfos?: {real: TensorInfo, imag: TensorInfo};
/** [rows, columns] shape of the texture. */
texShape?: [number, number];
usage?: TextureUsage;
isPacked?: boolean;
refCount: number;
// Available when the tensor has been sliced.
slice?: {
// Offset in the 'flat index' space.
flatOffset: number;
// Used for counting how many sliced tensors point to the same texture.
origDataId: DataId;
};
}
Let's dig into some of the fields:
texture
stores a WebGLTexture
. This is not a TFJS-specific type as it comes WebGL.
[LINK]
value
stores data that is not uploaded to the GPU yet.
texture
and value
are optional,
the backend can deal with data that have been updated to GPU and those that have not been uploaded.
usage
stores an enum of type
TextureUsage
,
which has the following definition:
export enum TextureUsage {
RENDER,
UPLOAD,
PIXELS,
DOWNLOAD
}
I don't quite understand the meaning of the TextureUsage
enum yet.
usage
is set to UPLOAD
.
[LINK]
isPacked
gives information about how texture data is laid out.
We will cover what this means momentarily.
Computation is carried out by running shaders on the GPU. This is done through the
runWebGLProgram
of the MathBackendWebGL
class. Here's the signature of the method.
runWebGLProgram(
program: GPGPUProgram,
inputs: TensorInfo[],
outputDtype: DataType,
customSetup?: (gpgpu: GPGPUContext, webGLProgram: WebGLProgram) => void,
preventEagerUnpackingOfOutput = false): TensorInfo
Here's how the
GPGPUProgram
interface is defined.
export interface GPGPUProgram {
variableNames: string[];
outputShape: number[];
userCode: string;
/** If true, this program expects packed input textures. Defaults to false. */
packedInputs?: boolean;
/** If true, this program produces a packed texture. Defaults to false. */
packedOutput?: boolean;
/**
* Affects what type of texture we allocate for the output. Defaults to
* `TextureUsage.RENDER`.
*/
outTexUsage?: TextureUsage;
/**
* The type of scheme to use when packing texels for the output values.
* See `PackingScheme` for details. Defaults to `PackingScheme.SHARED_BATCH`.
*/
outPackingScheme?: PackingScheme;
}
PackingScheme
is defined as follows.
export enum PackingScheme {
/**
* All values in a single texel are densely packed without any constraints.
*
* This is how the shader encodes a tensor with shape = [2, 3, 4]
* (indices are [batch, row, col]).
*
* 000|001 010|011 020|021
* ------- ------- -------
* 002|003 012|013 022|023
*
* 100|101 110|111 120|121
* ------- ------- -------
* 102|103 112|113 122|123
*
*/
DENSE,
/**
* Single texels contain only values from the same batch, and from adjacent
* rows and columns.
*
* This is how the shader encodes a tensor with shape = [2, 3, 5]
* (indices are [batch, row, col]).
*
* 000|001 002|003 004|xxx 020|021 022|023 024|xxx
* ------- ------- ------- ------- ------- -------
* 010|011 012|013 014|xxx xxx|xxx xxx|xxx xxx|xxx
*
* 100|101 102|103 104|xxx 120|121 122|123 124|xxx
* ------- ------- ------- ------- ------- -------
* 110|111 112|113 114|xxx xxx|xxx xxx|xxx xxx|xxx
*
*/
SHARED_BATCH
}
Now, I searched for SHARED_BATCH
across the repository, and it seems
that it was not used anywhere despite being documented that it is the default value.
GPGPUProgram
Examples
Let's see an example of a concrete instance of GPGPUProgram
.
The simplest one is the
UnaryOpProgram
.
export class UnaryOpProgram implements GPGPUProgram {
variableNames = ['A'];
userCode: string;
outputShape: number[];
constructor(aShape: number[], opSnippet: string) {
this.outputShape = aShape;
this.userCode = `
float unaryOperation(float x) {
${opSnippet}
}
void main() {
float x = getAAtOutCoords();
float y = unaryOperation(x);
setOutput(y);
}
`;
}
}
Note that the user code is not a complete program. It is only a snippet that must be combined with another piece of code in order to make a functioning fragment shader. In particular, two functions are not defined:
getAAtOutCoords()
, which seems to derive its name from the variableNames = ['A']
line.
The return type is a float
.
setOutput()
, which should set the output value at the output coordinates.
Here are some sample of opSnippet
found within the code file.
export const CHECK_NAN_SNIPPET = `if (isnan(x)) return x;`;
export const LINEAR = `return x;`;
export const ABS = `return abs(x);`;
export function STEP(alpha = 0.0) {
return CHECK_NAN_SNIPPET + `
return x > 0.0 ? 1.0 : float(${alpha});
`;
}
export const ELU = `return (x >= 0.0) ? x : (exp(x) - 1.0);`;
export const RELU = CHECK_NAN_SNIPPET + `
return (x < 0.0) ? 0.0 : x;
`;
However, there's another version, the
UnaryOpPackedProgram
.
export class UnaryOpPackedProgram implements GPGPUProgram {
variableNames = ['A'];
userCode: string;
outputShape: number[];
packedInputs = true;
packedOutput = true;
constructor(aShape: number[], opSnippet: string) {
this.outputShape = aShape;
this.userCode = `
vec4 unaryOperation(vec4 x) {
${opSnippet}
}
void main() {
vec4 x = getAAtOutCoords();
vec4 y = unaryOperation(x);
setOutput(y);
}
`;
}
}
This time, getAAtOutCoords
outputs a vec4
, and setOutput
also accepts a vec4
instead of a float
. Here are examples of opSnippet
found in the same file.
export const LINEAR = `return x;`;
export const ELU = `
vec4 result;
result.r = (x.r >= 0.0) ? x.r : (exp(x.r) - 1.0);
result.g = (x.g >= 0.0) ? x.g : (exp(x.g) - 1.0);
result.b = (x.b >= 0.0) ? x.b : (exp(x.b) - 1.0);
result.a = (x.a >= 0.0) ? x.a : (exp(x.a) - 1.0);
return result;
`;
export const RELU = `
vec4 result = x * vec4(greaterThanEqual(x, vec4(0.0)));
bvec4 isNaN = isnan(x);
result.r = isNaN.r ? x.r : result.r;
result.g = isNaN.g ? x.g : result.g;
result.b = isNaN.b ? x.b : result.b;
result.a = isNaN.a ? x.a : result.a;
return result;
`;
So, it would seem that, if the texture is packed, a texel stores an RGBA value. Otherwise, it stores only a float
.
Last modified: 2021/05/08