What a tensor is: N-dimensional array
Why this matters
Everything in PyTorch is a tensor: model weights, inputs, outputs, gradients. Understanding how tensors work at the mechanical level is prerequisite to building any neural network or training loop.
Explanation
A tensor is PyTorch's core data structure: think of it as a generalization of a NumPy array that can live on either CPU or GPU and automatically compute gradients. A 0D tensor is a scalar (single number), a 1D tensor is a vector, a 2D tensor is a matrix, and 3D+ tensors are higher-dimensional arrays. Mechanically, when you create a tensor with torch.tensor(), PyTorch allocates memory, stores the data, and wraps it with metadata (shape, dtype, device, requires_grad flag). Unlike NumPy, tensors can track the mathematical operations performed on them so that gradients can flow backward through a computation graph during training. You use tensors whenever you need GPU computation, automatic differentiation, or both: which is essentially every neural network training loop.
Analogy
If NumPy arrays are like raw plywood, tensors are like smart lumber that remembers every tool used to shape it, so you can rewind and improve your craftsmanship.
Code
import torch
scalar = torch.tensor(5.0)
print(f"Scalar shape: {scalar.shape}, value: {scalar}")
vector = torch.tensor([1.0, 2.0, 3.0])
print(f"Vector shape: {vector.shape}")
print(f"Vector: {vector}")
matrix = torch.tensor([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]])
print(f"Matrix shape: {matrix.shape}")
print(f"Matrix:\n{matrix}")
tensor_3d = torch.randn(2, 3, 4)
print(f"3D tensor shape: {tensor_3d.shape}")
tensor_gpu = torch.tensor([1.0, 2.0, 3.0])
if torch.cuda.is_available():
tensor_gpu = tensor_gpu.to('cuda')
print(f"Tensor on device: {tensor_gpu.device}")
else:
print(f"Tensor on device: {tensor_gpu.device}")
grad_tensor = torch.tensor([2.0, 3.0, 4.0], requires_grad=True)
print(f"\nGrad tracking enabled: {grad_tensor.requires_grad}")
print(f"Dtype: {grad_tensor.dtype}") Scalar shape: torch.Size([]), value: tensor(5.)
Vector shape: torch.Size([3])
Vector: tensor([1., 2., 3.])
Matrix shape: torch.Size([2, 3])
Matrix:
tensor([[1., 2., 3.],
[4., 5., 6.]])
3D tensor shape: torch.Size([2, 3, 4])
Tensor on device: cpu
Grad tracking enabled: True
Dtype: torch.float32 What just happened?
The code created tensors of increasing dimensionality (scalar, vector, 2D matrix, 3D array), printed their shapes and values, checked GPU availability (not present in this environment, so stayed on CPU), and demonstrated the <code>requires_grad=True</code> flag that tells PyTorch to track operations on that tensor for backpropagation. Each tensor printed shows its shape (dimensions) and dtype (data type).
Common gotcha
Developers often confuse tensor shape with tensor size: they're the same thing. More critically, a tensor created from a Python list defaults to dtype=torch.float32 for floating-point numbers. If you do arithmetic with integer tensors and expect floats, you'll get integer results and lose precision: always be explicit about dtype when it matters, e.g., torch.tensor([1, 2, 3], dtype=torch.float32).
Error recovery
RuntimeError: Expected a scalar tensorTypeError: can't convert cuda:0 device type tensor to numpyRuntimeError: leaf variable has been moved into the graph interiorExperienced dev note
A tensor is not just data: it's a node in a computation graph. The moment you create a tensor with requires_grad=True, PyTorch builds a linked list of every operation that touches it. This is powerful for training but silent memory overhead if you're not careful. Always use with torch.no_grad(): around inference code and explicit .detach() when you want to break the graph. Many performance bugs come from accidentally keeping gradients alive on tensors you don't need to train.
Check your understanding
If you create a tensor with shape (3, 4) and perform an in-place operation like t += 1 on it, and requires_grad=True is set, what happens and why would you get an error?
Show answer hint
In-place operations modify a tensor's values without creating a new tensor. PyTorch forbids in-place ops on leaf variables that require gradients because it breaks the computation graph: the gradient function would not know what the old value was. The correct approach is to avoid in-place operations or use <code>.detach()</code> first.
torch.Variable() to wrap tensors for gradient tracking. Since 2.0.0, all tensors are Variables by default: no wrapping needed. Tensors automatically track gradients if requires_grad=True is set. The old pattern torch.Variable(tensor, requires_grad=True) is dead.