Cheat Sheet intermediate · 8 min read

TensorFlow Cheat Sheet — Deep Learning Fundamentals

version 2.16.x

Deep learning framework for production ML

install pip install tensorflow tensorflow-gpu

core imports

python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers, callbacks

Mental model

Compute graphs and automatic differentiation for neural networks at scale.

Like a factory assembly line: raw materials (inputs) flow through stations (layers), get transformed (operations), and feedback flows backward (gradients) to optimize each station.

Key Concepts

Tensor

Multi-dimensional array (scalar, vector, matrix, or higher-order array) that holds data flowing through the computation graph.

Keras API

High-level neural network library integrated into TensorFlow 2.x; provides Sequential and Functional model builders and pre-built layers.

Layer

Encapsulates weights, biases, and transformations (Dense, Conv2D, LSTM, Embedding); the building block of models.

Model

Composition of layers connected in a computation graph; accepts inputs, processes through layers, produces outputs.

GradientTape

Context manager that records operations for automatic differentiation; enables custom training loops with fine-grained gradient control.

Callback

Function triggered at training epochs (e.g., EarlyStopping, ModelCheckpoint) to monitor and modify training behavior.

Optimizer

Algorithm that updates model weights using gradients (Adam, SGD, RMSprop); controls learning rate and momentum.

Loss Function

Measures prediction error; guides gradient descent to minimize the difference between predicted and actual values.

Tensorflow Patterns

01 Sequential Model (Linear Stack)

Simple feedforward networks with one input, one output.

python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Input(shape=(784,)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.fit(x_train, y_train, epochs=10, batch_size=32)

output

Epoch 1/10
32/32 [==============================] - 0s 1ms/step - loss: 2.3041 - accuracy: 0.1089

Input shape must match training data dimensions. Sequential expects (batch_size, features); use Input layer for explicit shape declaration.

02 Functional API (Multi-Input/Output)

Complex architectures with branching, skip connections, or multiple inputs.

python

from tensorflow.keras import layers, models, Input

inputs = Input(shape=(784,))
x = layers.Dense(128, activation='relu')(inputs)
x = layers.Dropout(0.2)(x)
branch1 = layers.Dense(64, activation='relu')(x)
branch2 = layers.Dense(64, activation='relu')(x)
merged = layers.Concatenate()([branch1, branch2])
outputs = layers.Dense(10, activation='softmax')(merged)

model = models.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Each layer is a callable function; layers must be explicitly connected. Intermediate outputs can branch and merge.

03 Custom Training Loop (GradientTape)

Non-standard loss computations, custom backprop, or research-grade control.

python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import optimizers

model = keras.Sequential([layers.Dense(10)])
optimizer = optimizers.Adam(learning_rate=0.01)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

for epoch in range(10):
    for x_batch, y_batch in tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32):
        with tf.GradientTape() as tape:
            logits = model(x_batch, training=True)
            loss_value = loss_fn(y_batch, logits)
        
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        print(f'Loss: {loss_value:.4f}')

GradientTape only records operations on tf.Variable or tf.Tensor; numpy arrays are not tracked. Set training=True for correct batch norm/dropout behavior.

04 Callbacks for Training Control

Early stopping, checkpointing, learning rate scheduling, or custom monitoring.

python

from tensorflow.keras.callbacks import (
    EarlyStopping,
    ModelCheckpoint,
    ReduceLROnPlateau,
    TensorBoard
)

callbacks_list = [
    EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    ),
    ModelCheckpoint(
        'best_model.keras',
        monitor='val_accuracy',
        save_best_only=True
    ),
    ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3,
        min_lr=1e-7
    ),
    TensorBoard(log_dir='./logs')
]

model.fit(
    x_train, y_train,
    validation_split=0.2,
    epochs=100,
    callbacks=callbacks_list
)

EarlyStopping defaults to mode='auto'; monitor metric name must match training output. patience is epochs without improvement before stopping.

05 tf.data Pipeline (Efficient Loading)

Large datasets, disk-to-GPU streaming, preprocessing at scale.

python

import tensorflow as tf

dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.map(
    lambda x, y: (
        tf.image.random_flip_left_right(x) if len(x.shape) > 1 else x,
        y
    ),
    num_parallel_calls=tf.data.AUTOTUNE
)
dataset = dataset.prefetch(tf.data.AUTOTUNE)

model.fit(dataset, epochs=10)

prefetch() must be last in pipeline. num_parallel_calls=AUTOTUNE auto-tunes; omitting it causes bottlenecks on multi-GPU systems.

06 Transfer Learning (Pre-trained Weights)

Limited data, need to leverage ImageNet or BERT; fine-tune existing models.

python

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models

base_model = MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights='imagenet'
)
base_model.trainable = False  # Freeze weights

model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(train_data, epochs=5)

# Fine-tune: unfreeze last layers
base_model.trainable = True
for layer in base_model.layers[:-20]:
    layer.trainable = False

model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(train_data, epochs=10)

Always freeze base_model initially and use lower learning rate when unfreezing. weights='imagenet' requires internet on first run; cache locally.

07 Batch Normalization & Dropout

Stabilizing training, reducing overfitting on deep networks.

python

from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Input(shape=(784,)),
    layers.Dense(256),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.3),
    layers.Dense(128),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.3),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(x_train, y_train, epochs=10)

BatchNormalization must be called with training=True during training, training=False during inference. model.fit() handles this; custom loops must pass training flag.

Essential Dense & Convolutional Layer Args

Common Keras Layer Parameters

Parameter	Default	Purpose
`units (Dense)`	:	Output dimensionality; required for Dense layers
`activation`	None	Activation function ('relu', 'sigmoid', 'tanh', 'softmax', 'linear')
`filters (Conv2D)`	:	Number of output channels; required for Conv2D
`kernel_size`	:	Height & width of convolution window (e.g., (3, 3))
`strides`	1	Step size for convolution window; (1,1) for dense movement, (2,2) for downsampling
`padding`	'valid'	'valid' (no padding) or 'same' (zero-pad to maintain spatial dims)
`use_bias`	True	Add bias vector to layer; set False to save parameters
`kernel_regularizer`	None	Weight regularization (keras.regularizers.l1_l2(l1=0.01, l2=0.01))
`dropout_rate (Dropout)`	:	Fraction of inputs to drop (0.0–1.0); typically 0.2–0.5
`momentum (BatchNorm)`	0.99	Exponential decay for moving mean/variance; higher = smoother estimates

Tensorflow Comparison

Aspect	Sequential API	Functional API
Use Case	Linear stack of layers (simple models)	Complex branching, multi-input/output, skip connections
Syntax	Sequential([Layer1(), Layer2()])	inputs → layer1 → layer2 → outputs; Model(inputs, outputs)
Multi-Input	❌ Not supported	✅ Multiple Input() layers
Skip Connections	❌ Not supported	✅ Supported (layer(prev_layer + new_layer))
Debugging	Simple; linear flow	Plot with plot_model(); trace branching visually
Performance	Same (both compile to same graph)	Same (both compile to same graph)

Common Errors & Fixes

01 ValueError: Input 0 of layer is incompatible with the layer

Cause: Shape mismatch between layer output and next layer input. Dense(10) expects 2D input; passing 3D or 4D tensor causes error.

Fix:

python

Check input_shape on first layer and layer connections:

✅ CORRECT:
model = keras.Sequential([
    layers.Input(shape=(784,)),  # Explicitly set shape
    layers.Dense(128),
    layers.Dense(10)
])

❌ WRONG:
model = keras.Sequential([
    layers.Dense(128),  # Input shape inferred at compile; if batch has wrong shape, error occurs
])

Use model.summary() to print shapes at each layer.

02 ResourceExhaustedError: OOM when allocating tensor with shape [...] and type float32

Cause: Model, batch size, or dataset exceeds GPU/CPU memory. Large models on small GPUs, or batch_size too large.

Fix:

python

Reduce batch_size, use tf.data.Dataset with prefetch, or enable mixed precision:

✅ CORRECT:
from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

model = keras.Sequential([layers.Dense(1024)])
model.compile(optimizer='adam', loss='mse')
model.fit(dataset, batch_size=16)  # Smaller batch

❌ WRONG:
model.fit(dataset, batch_size=2048)  # Too large; OOM

03 TypeError: Cannot convert a symbolic Tensor to a numpy array

Cause: Trying to convert a symbolic tensor (inside model) to numpy outside tf.function context or during graph execution.

Fix:

python

Use model.predict() or extract values inside @tf.function:

✅ CORRECT:
predictions = model.predict(x_test)  # Returns numpy array
print(predictions.shape)

✅ CORRECT (custom loop):
with tf.GradientTape() as tape:
    y_pred = model(x_batch, training=True)
    loss = loss_fn(y_batch, y_pred)
    print(loss.numpy())  # .numpy() to extract scalar

❌ WRONG:
outputs = model(x_test)
numpy_array = np.array(outputs)  # Fails if outputs is symbolic

04 UnimplementedError: Could not differentiate [...] with respect to any input of the layer

Cause: Tensor is not tf.Variable or not watched by GradientTape. Numpy array or constant tensor cannot be differentiated.

Fix:

python

Ensure inputs are tf.Tensor or tf.Variable:

✅ CORRECT:
x = tf.Variable([[1.0, 2.0]])
with tf.GradientTape() as tape:
    y = tf.square(x)
grad = tape.gradient(y, x)  # Works

✅ CORRECT:
with tf.GradientTape() as tape:
    x = tf.convert_to_tensor([[1.0, 2.0]])
    y = tf.square(x)
grad = tape.gradient(y, x)  # Works if tape watches

❌ WRONG:
x_np = np.array([[1.0, 2.0]])
with tf.GradientTape() as tape:
    y = tf.square(x_np)  # x_np not watched; grad is None

Production Gotchas

⚠ Batch Norm & Dropout behave differently at train vs. inference

BatchNormalization uses running mean/variance at inference; Dropout is disabled. Custom training loops must pass training=True/False. If you forget, inference metrics (accuracy, loss) will be incorrect and won't match training metrics.

⚠ model.predict() is slower than model(x) in loops

model.predict() adds overhead (builds graph, allocates memory). Use model(x, training=False) in loops or wrap in @tf.function for speed. model.predict() is convenient for one-off inference but not for batched prediction.

⚠ GradientTape does not persist across calls

Each 'with tf.GradientTape()' context creates a new tape. If you compute gradients outside the context, tape.gradient() returns None. Re-create the tape inside the context or use persistent=True if you need multiple .gradient() calls.

⚠ Freezing weights requires recompilation

Setting layer.trainable = False does not automatically exclude it from optimizer updates. You must recompile with model.compile() after changing trainable status. Old optimizer state is lost; learning rate resets to default.

⚠ Input shape inference fails with None batch dimension

First layer input_shape should NOT include batch size: shape=(784,) not shape=(None, 784). TensorFlow infers batch dimension as None. Specifying batch size in shape causes shape mismatch errors.

⚠ Callbacks monitor parameter must match output name

EarlyStopping(monitor='val_loss') fails silently if 'val_loss' is not in model.history.history. Use model.summary() or print(model.history.history.keys()) to verify metric names (e.g., 'loss', 'accuracy', 'val_accuracy').

⚠ tf.data.Dataset.map() with tf.py_function breaks graph mode

Using Python functions in .map() converts operations to eager mode, losing performance benefits. Wrap tf.py_function in @tf.function or use TensorFlow ops (tf.image, tf.strings) instead of NumPy.

⚠ Model weights are not saved with model.save() if layers use Lambda

Lambda layers contain arbitrary Python code; model.save() cannot serialize them. Use custom layers (tf.keras.layers.Layer subclass) instead of Lambda for production models.

Verified 2026-04 · v2.16.x

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.