TensorFlow Cheat Sheet — Deep Learning Fundamentals
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers, callbacks Compute graphs and automatic differentiation for neural networks at scale.
Like a factory assembly line: raw materials (inputs) flow through stations (layers), get transformed (operations), and feedback flows backward (gradients) to optimize each station.
Key Concepts
Tensorflow Patterns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Input(shape=(784,)),
layers.Dense(128, activation='relu'),
layers.Dropout(0.2),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model.fit(x_train, y_train, epochs=10, batch_size=32) Epoch 1/10
32/32 [==============================] - 0s 1ms/step - loss: 2.3041 - accuracy: 0.1089 from tensorflow.keras import layers, models, Input
inputs = Input(shape=(784,))
x = layers.Dense(128, activation='relu')(inputs)
x = layers.Dropout(0.2)(x)
branch1 = layers.Dense(64, activation='relu')(x)
branch2 = layers.Dense(64, activation='relu')(x)
merged = layers.Concatenate()([branch1, branch2])
outputs = layers.Dense(10, activation='softmax')(merged)
model = models.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import optimizers
model = keras.Sequential([layers.Dense(10)])
optimizer = optimizers.Adam(learning_rate=0.01)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
for epoch in range(10):
for x_batch, y_batch in tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32):
with tf.GradientTape() as tape:
logits = model(x_batch, training=True)
loss_value = loss_fn(y_batch, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
print(f'Loss: {loss_value:.4f}') from tensorflow.keras.callbacks import (
EarlyStopping,
ModelCheckpoint,
ReduceLROnPlateau,
TensorBoard
)
callbacks_list = [
EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True
),
ModelCheckpoint(
'best_model.keras',
monitor='val_accuracy',
save_best_only=True
),
ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=3,
min_lr=1e-7
),
TensorBoard(log_dir='./logs')
]
model.fit(
x_train, y_train,
validation_split=0.2,
epochs=100,
callbacks=callbacks_list
) import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.map(
lambda x, y: (
tf.image.random_flip_left_right(x) if len(x.shape) > 1 else x,
y
),
num_parallel_calls=tf.data.AUTOTUNE
)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
model.fit(dataset, epochs=10) from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models
base_model = MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet'
)
base_model.trainable = False # Freeze weights
model = models.Sequential([
base_model,
layers.GlobalAveragePooling2D(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(train_data, epochs=5)
# Fine-tune: unfreeze last layers
base_model.trainable = True
for layer in base_model.layers[:-20]:
layer.trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(train_data, epochs=10) from tensorflow.keras import layers, models
model = models.Sequential([
layers.Input(shape=(784,)),
layers.Dense(256),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.Dropout(0.3),
layers.Dense(128),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.Dropout(0.3),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(x_train, y_train, epochs=10) Essential Dense & Convolutional Layer Args
Common Keras Layer Parameters
| Parameter | Default | Purpose |
|---|---|---|
units (Dense) | : | Output dimensionality; required for Dense layers |
activation | None | Activation function ('relu', 'sigmoid', 'tanh', 'softmax', 'linear') |
filters (Conv2D) | : | Number of output channels; required for Conv2D |
kernel_size | : | Height & width of convolution window (e.g., (3, 3)) |
strides | 1 | Step size for convolution window; (1,1) for dense movement, (2,2) for downsampling |
padding | 'valid' | 'valid' (no padding) or 'same' (zero-pad to maintain spatial dims) |
use_bias | True | Add bias vector to layer; set False to save parameters |
kernel_regularizer | None | Weight regularization (keras.regularizers.l1_l2(l1=0.01, l2=0.01)) |
dropout_rate (Dropout) | : | Fraction of inputs to drop (0.0–1.0); typically 0.2–0.5 |
momentum (BatchNorm) | 0.99 | Exponential decay for moving mean/variance; higher = smoother estimates |
Tensorflow Comparison
| Aspect | Sequential API | Functional API |
|---|---|---|
| Use Case | Linear stack of layers (simple models) | Complex branching, multi-input/output, skip connections |
| Syntax | Sequential([Layer1(), Layer2()]) | inputs → layer1 → layer2 → outputs; Model(inputs, outputs) |
| Multi-Input | ❌ Not supported | ✅ Multiple Input() layers |
| Skip Connections | ❌ Not supported | ✅ Supported (layer(prev_layer + new_layer)) |
| Debugging | Simple; linear flow | Plot with plot_model(); trace branching visually |
| Performance | Same (both compile to same graph) | Same (both compile to same graph) |
Common Errors & Fixes
ValueError: Input 0 of layer is incompatible with the layer Cause: Shape mismatch between layer output and next layer input. Dense(10) expects 2D input; passing 3D or 4D tensor causes error.
Check input_shape on first layer and layer connections:
✅ CORRECT:
model = keras.Sequential([
layers.Input(shape=(784,)), # Explicitly set shape
layers.Dense(128),
layers.Dense(10)
])
❌ WRONG:
model = keras.Sequential([
layers.Dense(128), # Input shape inferred at compile; if batch has wrong shape, error occurs
])
Use model.summary() to print shapes at each layer. ResourceExhaustedError: OOM when allocating tensor with shape [...] and type float32 Cause: Model, batch size, or dataset exceeds GPU/CPU memory. Large models on small GPUs, or batch_size too large.
Reduce batch_size, use tf.data.Dataset with prefetch, or enable mixed precision:
✅ CORRECT:
from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)
model = keras.Sequential([layers.Dense(1024)])
model.compile(optimizer='adam', loss='mse')
model.fit(dataset, batch_size=16) # Smaller batch
❌ WRONG:
model.fit(dataset, batch_size=2048) # Too large; OOM TypeError: Cannot convert a symbolic Tensor to a numpy array Cause: Trying to convert a symbolic tensor (inside model) to numpy outside tf.function context or during graph execution.
Use model.predict() or extract values inside @tf.function:
✅ CORRECT:
predictions = model.predict(x_test) # Returns numpy array
print(predictions.shape)
✅ CORRECT (custom loop):
with tf.GradientTape() as tape:
y_pred = model(x_batch, training=True)
loss = loss_fn(y_batch, y_pred)
print(loss.numpy()) # .numpy() to extract scalar
❌ WRONG:
outputs = model(x_test)
numpy_array = np.array(outputs) # Fails if outputs is symbolic UnimplementedError: Could not differentiate [...] with respect to any input of the layer Cause: Tensor is not tf.Variable or not watched by GradientTape. Numpy array or constant tensor cannot be differentiated.
Ensure inputs are tf.Tensor or tf.Variable:
✅ CORRECT:
x = tf.Variable([[1.0, 2.0]])
with tf.GradientTape() as tape:
y = tf.square(x)
grad = tape.gradient(y, x) # Works
✅ CORRECT:
with tf.GradientTape() as tape:
x = tf.convert_to_tensor([[1.0, 2.0]])
y = tf.square(x)
grad = tape.gradient(y, x) # Works if tape watches
❌ WRONG:
x_np = np.array([[1.0, 2.0]])
with tf.GradientTape() as tape:
y = tf.square(x_np) # x_np not watched; grad is None Production Gotchas
BatchNormalization uses running mean/variance at inference; Dropout is disabled. Custom training loops must pass training=True/False. If you forget, inference metrics (accuracy, loss) will be incorrect and won't match training metrics.
model.predict() adds overhead (builds graph, allocates memory). Use model(x, training=False) in loops or wrap in @tf.function for speed. model.predict() is convenient for one-off inference but not for batched prediction.
Each 'with tf.GradientTape()' context creates a new tape. If you compute gradients outside the context, tape.gradient() returns None. Re-create the tape inside the context or use persistent=True if you need multiple .gradient() calls.
Setting layer.trainable = False does not automatically exclude it from optimizer updates. You must recompile with model.compile() after changing trainable status. Old optimizer state is lost; learning rate resets to default.
First layer input_shape should NOT include batch size: shape=(784,) not shape=(None, 784). TensorFlow infers batch dimension as None. Specifying batch size in shape causes shape mismatch errors.
EarlyStopping(monitor='val_loss') fails silently if 'val_loss' is not in model.history.history. Use model.summary() or print(model.history.history.keys()) to verify metric names (e.g., 'loss', 'accuracy', 'val_accuracy').
Using Python functions in .map() converts operations to eager mode, losing performance benefits. Wrap tf.py_function in @tf.function or use TensorFlow ops (tf.image, tf.strings) instead of NumPy.
Lambda layers contain arbitrary Python code; model.save() cannot serialize them. Use custom layers (tf.keras.layers.Layer subclass) instead of Lambda for production models.