Debug Fix beginner · 3 min read

How to handle missing values in pandas

Quick answer
Use pandas methods like dropna() to remove missing values or fillna() to replace them with specific values or strategies. For more advanced handling, use interpolate() to estimate missing data based on existing values.
ERROR TYPE code_error
⚡ QUICK FIX
Use df.fillna() or df.dropna() to handle missing values explicitly before model training.

Why this happens

Missing values in pandas DataFrames occur when data is incomplete or corrupted, represented as NaN. This can cause errors or unexpected behavior in PyTorch models or data pipelines that expect complete numeric inputs.

Example code triggering issues:

python
import pandas as pd
import torch

df = pd.DataFrame({'feature1': [1, 2, None, 4], 'feature2': [None, 2, 3, 4]})
tensor = torch.tensor(df.values)
print(tensor)
output
ValueError: could not convert string to float: 'nan'

The fix

Use dropna() to remove rows with missing values or fillna() to replace missing values with a constant or a computed value like the mean. This ensures the DataFrame has no NaN before converting to tensors.

Example with fillna():

python
import pandas as pd
import torch

df = pd.DataFrame({'feature1': [1, 2, None, 4], 'feature2': [None, 2, 3, 4]})
df_filled = df.fillna(df.mean())
tensor = torch.tensor(df_filled.values, dtype=torch.float32)
print(tensor)
output
[[1.  2.5]
 [2.  2. ]
 [2.3333333 3. ]
 [4.  4. ]]

Preventing it in production

Validate data for missing values before model input using df.isnull().sum(). Automate handling with pipelines that fill or drop missing data. Use retries or alerts if missing data exceeds thresholds to maintain data quality in production ML workflows.

python
import pandas as pd

def preprocess(df):
    if df.isnull().sum().sum() > 0:
        df = df.fillna(df.mean())
    return df

# Example usage
raw_df = pd.DataFrame({'a': [1, None, 3], 'b': [4, 5, None]})
clean_df = preprocess(raw_df)
print(clean_df)
output
     a    b
0  1.0  4.0
1  2.0  5.0
2  3.0  4.5

Key Takeaways

  • Always check for missing values with df.isnull().sum() before ML processing.
  • Use df.fillna() to replace missing values with meaningful defaults or statistics.
  • Use df.dropna() to remove incomplete rows if appropriate for your dataset.
  • Convert cleaned DataFrame to numpy array before creating PyTorch tensors.
  • Automate missing data handling in production pipelines to avoid runtime errors.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗