output = layers(layers(layers(layers(input))))
instead of the classical:
output = layer4(layer3(layer2(layer1(input))))
output = layers(input)
Or
output = layers(layers(input))
Depends on how difficult the token is.
x = tokenize(input) i = 0 do { finish, x = layers(x) } while(!finish && i++ < t_max); output = lm_head(x)
x = tokenize(input) i = 0 finish = 0 do { p, x = layers(x) finish += p } while(finish < 0.95 && i++ < t_max); output = lm_head(x)
output = layers(layers(layers(layers(input))))
instead of the classical:
output = layer4(layer3(layer2(layer1(input))))