Interactive online version: Open In Colab

Haiku Basics

In this Colab, you will learn the basics of Haiku.

What and Why ?

Haiku is a simple neural network library for JAX that enables users to use familiar object-oriented programming models while allowing full access to JAX’s pure function transformations. Haiku is designed to make the common things we do such as managing model parameters and other model state simpler and similar in spirit to the Sonnet library that has been widely used across DeepMind. It preserves Sonnet’s module-based programming model for state management while retaining access to JAX’s function transformations. Haiku can be expected to compose with other libraries and work well with the rest of JAX.

[1]:
import haiku as hk
import jax
import jax.numpy as jnp
import numpy as np

A first example with hk.transform

As an initial introduction to Haiku, let us construct a linear module with weights and biases with custom initializations.

Similar to Sonnet modules, Haiku modules are Python objects that hold references to their own parameters, other modules, and methods that apply functions on user inputs. On the other hand, since JAX operates on pure function transformations, Haiku modules cannot be instantiated verbatim. Rather, the modules need to be wrapped into pure function transformations.

Haiku provides a simple function transformation, hk.transform, that turns functions that use these object-oriented, functionally “impure” modules into pure functions that can be used with JAX.

[2]:
class MyLinear1(hk.Module):

  def __init__(self, output_size, name=None):
    super().__init__(name=name)
    self.output_size = output_size

  def __call__(self, x):
    j, k = x.shape[-1], self.output_size
    w_init = hk.initializers.TruncatedNormal(1. / np.sqrt(j))
    w = hk.get_parameter("w", shape=[j, k], dtype=x.dtype, init=w_init)
    b = hk.get_parameter("b", shape=[k], dtype=x.dtype, init=jnp.ones)
    return jnp.dot(x, w) + b
[3]:
def _forward_fn_linear1(x):
  module = MyLinear1(output_size=2)
  return module(x)

forward_linear1 = hk.transform(_forward_fn_linear1)

We see that the forward wrapper object now contains two methods, init and apply, that are used to initialize the variables and do forward inference on the module.

[4]:
forward_linear1
[4]:
Transformed(init=<function without_state.<locals>.init_fn at 0x7fa22fe754c0>, apply=<function without_state.<locals>.apply_fn at 0x7fa22fe75550>)

Calling the init method will initialize the parameters of the network and return them, as can be seen below. The init method takes a jax.random.PRNGKey and a sample input (usually just some dummy values to tell the networks about the expected shapes).

[5]:
dummy_x = jnp.array([[1., 2., 3.]])
rng_key = jax.random.PRNGKey(42)

params = forward_linear1.init(rng=rng_key, x=dummy_x)
print(params)
/tmp/haiku-docs-env/lib/python3.8/site-packages/jax/lib/xla_bridge.py:130: UserWarning: No GPU/TPU found, falling back to CPU.
  warnings.warn('No GPU/TPU found, falling back to CPU.')
FlatMapping({
  'my_linear1': FlatMapping({
                  'w': DeviceArray([[-0.30350363,  0.5123802 ],
                                    [ 0.08009142, -0.3163005 ],
                                    [ 0.6056666 ,  0.5820702 ]], dtype=float32),
                  'b': DeviceArray([1., 1.], dtype=float32),
                }),
})

We can now use the params to apply the forward function to some inputs.

[6]:
sample_x = jnp.array([[1., 2., 3.]])
sample_x_2 = jnp.array([[4., 5., 6.], [7., 8., 9.]])

output_1 = forward_linear1.apply(params=params, x=sample_x, rng=rng_key)
# Outputs are identical for given inputs since the forward inference is non-stochastic.
output_2 = forward_linear1.apply(params=params, x=sample_x, rng=rng_key)

output_3 = forward_linear1.apply(params=params, x=sample_x_2, rng=rng_key)

print(f'Output 1 : {output_1}')
print(f'Output 2 (same as output 1): {output_2}')
print(f'Output 3 : {output_3}')
Output 1 : [[2.6736789 2.6259897]]
Output 2 (same as output 1): [[2.6736789 2.6259897]]
Output 3 : [[3.820442 4.960439]
 [4.967205 7.294889]]

Inference without random key

The module that we built is inherently non-stochastic. In that case, passing a random key to the apply method seems redundant. Haiku offers another transformation hk.without_apply_rng which can be further wrapped around our hk.transform method.

[7]:
forward_without_rng = hk.without_apply_rng(hk.transform(_forward_fn_linear1))
params = forward_without_rng.init(rng=rng_key, x=sample_x)
output = forward_without_rng.apply(x=sample_x, params=params)
print(f'Output without random key in forward pass \n {output_1}')
Output without random key in forward pass
 [[2.6736789 2.6259897]]

We can also mutate the parameters and then do forward inference to generate a different output for the same inputs. This is what is done to apply gradient descent to our parameters while learning.

[8]:
mutated_params = jax.tree_map(lambda x: x+1., params)
print(f'Mutated params \n : {mutated_params}')
mutated_output = forward_without_rng.apply(x=sample_x, params=mutated_params)
print(f'Output with mutated params \n {mutated_output}')
Mutated params
 : FlatMapping({
  'my_linear1': FlatMapping({
                  'b': DeviceArray([2., 2.], dtype=float32),
                  'w': DeviceArray([[0.69649637, 1.5123801 ],
                                    [1.0800915 , 0.6836995 ],
                                    [1.6056666 , 1.5820701 ]], dtype=float32),
                }),
})
Output with mutated params
 [[9.673679 9.62599 ]]

Stateful Inference in Haiku

For some modules you might want to maintain and carry over the internal state across function calls. Here, we demonstrate a simple example, where we declare a state variable counter within our Haiku transformation which gets updated on each call to the function. Note that we didn’t explicitly instantiate this as a Haiku module (the same could be replicated as a hk module as shown earlier).

[9]:
def stateful_f(x):
  counter = hk.get_state("counter", shape=[], dtype=jnp.int32, init=jnp.ones)
  multiplier = hk.get_parameter('multiplier', shape=[1,], dtype=x.dtype, init=jnp.ones)
  hk.set_state("counter", counter + 1)
  output = x + multiplier * counter
  return output

stateful_forward = hk.without_apply_rng(hk.transform_with_state(stateful_f))
sample_x = jnp.array([[5., ]])
params, state = stateful_forward.init(x=sample_x, rng=rng_key)
print(f'Initial params:\n{params}\nInitial state:\n{state}')
print('##########')
for i in range(3):
  output, state = stateful_forward.apply(params, state, x=sample_x)
  print(f'After {i+1} iterations:\nOutput: {output}\nState: {state}')
  print('##########')
Initial params:
FlatMapping({
  '~': FlatMapping({'multiplier': DeviceArray([1.], dtype=float32)}),
})
Initial state:
FlatMapping({'~': FlatMapping({'counter': DeviceArray(1, dtype=int32)})})
##########
After 1 iterations:
Output: [[6.]]
State: FlatMapping({'~': FlatMapping({'counter': DeviceArray(2, dtype=int32)})})
##########
After 2 iterations:
Output: [[7.]]
State: FlatMapping({'~': FlatMapping({'counter': DeviceArray(3, dtype=int32)})})
##########
After 3 iterations:
Output: [[8.]]
State: FlatMapping({'~': FlatMapping({'counter': DeviceArray(4, dtype=int32)})})
##########

Built-in Haiku nets and nested modules

The usual networks we use such as MLP, Convnets etc. are defined already in Haiku and we can compose those modules to construct our custom Haiku Module.

Look at the params dictionary to see how the params are nested in the same way as the modules are nested within our custom Haiku module.

[10]:
# See: https://dm-haiku.readthedocs.io/en/latest/api.html#common-modules

class MyModuleCustom(hk.Module):
  def __init__(self, output_size=2, name='custom_linear'):
    super().__init__(name=name)
    self._internal_linear_1 = hk.nets.MLP(output_sizes=[2, 3], name='hk_internal_linear')
    self._internal_linear_2 = MyLinear1(output_size=output_size, name='old_linear')

  def __call__(self, x):
    return self._internal_linear_2(self._internal_linear_1(x))

def _custom_forward_fn(x):
  module = MyModuleCustom()
  return module(x)

custom_forward_without_rng = hk.without_apply_rng(hk.transform(_custom_forward_fn))
params = custom_forward_without_rng.init(rng=rng_key, x=sample_x)
params
[10]:
FlatMapping({
  'custom_linear/~/hk_internal_linear/~/linear_0': FlatMapping({
                                                     'w': DeviceArray([[ 1.51595   , -0.23353337]], dtype=float32),
                                                     'b': DeviceArray([0., 0.], dtype=float32),
                                                   }),
  'custom_linear/~/hk_internal_linear/~/linear_1': FlatMapping({
                                                     'w': DeviceArray([[-0.22075887, -0.27375957,  0.5931483 ],
                                                                       [ 0.7818068 ,  0.72626334, -0.6860752 ]], dtype=float32),
                                                     'b': DeviceArray([0., 0., 0.], dtype=float32),
                                                   }),
  'custom_linear/~/old_linear': FlatMapping({
                                  'w': DeviceArray([[ 0.28584382,  0.31626168],
                                                    [ 0.2335775 , -0.4827032 ],
                                                    [-0.14647584, -0.7185701 ]], dtype=float32),
                                  'b': DeviceArray([1., 1.], dtype=float32),
                                }),
})

Rng Keys with hk.next_rng_key()

The modules that we saw earlier were all non-stochastic. Below we show how to sample random numbers to do stochastic inference.

Haiku offers a trivial model for working with random numbers. Within a transformed function, hk.next_rng_key() returns a unique rng key. These unique keys are deterministically derived from an initial random key passed into the top-level transformed function, and are thus safe to use with JAX program transformations.

Let us define a simple haiku function where we generate two random samples. Note that the next_rng_keys are determined from the initial random key passed to the apply method of the top-level transformed function.

[11]:
class HkRandom2(hk.Module):
  def __init__(self, rate=0.5):
    super().__init__()
    self.rate = rate

  def __call__(self, x):
    key1 = hk.next_rng_key()
    return jax.random.bernoulli(key1, 1.0 - self.rate, shape=x.shape)


class HkRandomNest(hk.Module):
  def __init__(self, rate=0.5):
    super().__init__()
    self.rate = rate
    self._another_random_module = HkRandom2()

  def __call__(self, x):
    key2 = hk.next_rng_key()
    p1 = self._another_random_module(x)
    p2 = jax.random.bernoulli(key2, 1.0 - self.rate, shape=x.shape)
    print(f'Bernoullis are  : {p1, p2}')

# Note that the modules that are stochastic cannot be wrapped with hk.without_apply_rng()
forward = hk.transform(lambda x: HkRandomNest()(x))

x = jnp.array(1.)
params = forward.init(rng_key, x=x)
for i in range(5):
  print(f'\n Iteration {i+1}')
  prediction = forward.apply(params, x=x, rng=rng_key)

Bernoullis are  : (DeviceArray(True, dtype=bool), DeviceArray(False, dtype=bool))

 Iteration 1
Bernoullis are  : (DeviceArray(True, dtype=bool), DeviceArray(False, dtype=bool))

 Iteration 2
Bernoullis are  : (DeviceArray(True, dtype=bool), DeviceArray(False, dtype=bool))

 Iteration 3
Bernoullis are  : (DeviceArray(True, dtype=bool), DeviceArray(False, dtype=bool))

 Iteration 4
Bernoullis are  : (DeviceArray(True, dtype=bool), DeviceArray(False, dtype=bool))

 Iteration 5
Bernoullis are  : (DeviceArray(True, dtype=bool), DeviceArray(False, dtype=bool))