Debug Fix beginner · 3 min read

How to handle missing values in pandas

Q: How to handle missing values in pandas

Use pandas methods like dropna() to remove missing values or fillna() to replace them with specific values or strategies. For more advanced handling, use interpolate() to estimate missing data based on existing values.

Quick answer

Use pandas methods like dropna() to remove missing values or fillna() to replace them with specific values or strategies. For more advanced handling, use interpolate() to estimate missing data based on existing values.

ERROR TYPE code_error

⚡ QUICK FIX

Use df.fillna() or df.dropna() to handle missing values explicitly before model training.

Why this happens

Missing values in pandas DataFrames occur when data is incomplete or corrupted, represented as NaN. This can cause errors or unexpected behavior in PyTorch models or data pipelines that expect complete numeric inputs.

Example code triggering issues:

python

import pandas as pd
import torch

df = pd.DataFrame({'feature1': [1, 2, None, 4], 'feature2': [None, 2, 3, 4]})
tensor = torch.tensor(df.values)
print(tensor)

output

ValueError: could not convert string to float: 'nan'

The fix

Use dropna() to remove rows with missing values or fillna() to replace missing values with a constant or a computed value like the mean. This ensures the DataFrame has no NaN before converting to tensors.

Example with fillna():

python

import pandas as pd
import torch

df = pd.DataFrame({'feature1': [1, 2, None, 4], 'feature2': [None, 2, 3, 4]})
df_filled = df.fillna(df.mean())
tensor = torch.tensor(df_filled.values, dtype=torch.float32)
print(tensor)

output

[[1.  2.5]
 [2.  2. ]
 [2.3333333 3. ]
 [4.  4. ]]

Preventing it in production

Validate data for missing values before model input using df.isnull().sum(). Automate handling with pipelines that fill or drop missing data. Use retries or alerts if missing data exceeds thresholds to maintain data quality in production ML workflows.

python

import pandas as pd

def preprocess(df):
    if df.isnull().sum().sum() > 0:
        df = df.fillna(df.mean())
    return df

# Example usage
raw_df = pd.DataFrame({'a': [1, None, 3], 'b': [4, 5, None]})
clean_df = preprocess(raw_df)
print(clean_df)

output

     a    b
0  1.0  4.0
1  2.0  5.0
2  3.0  4.5

Related errors

Error	Cause	Quick fix
ValueError: could not convert string to float: 'nan'	NaN values in DataFrame when converting to tensor	Use df.fillna() or df.dropna() before conversion
RuntimeError: input contains NaN or Inf	Tensor contains NaN after conversion	Validate and clean data with pandas before tensor creation
TypeError: expected np.ndarray (got DataFrame)	Passing DataFrame directly to torch.tensor	Use df.values or df.to_numpy() to convert to numpy array first

✅

Key Takeaways

Always check for missing values with df.isnull().sum() before ML processing.
Use df.fillna() to replace missing values with meaningful defaults or statistics.
Use df.dropna() to remove incomplete rows if appropriate for your dataset.
Convert cleaned DataFrame to numpy array before creating PyTorch tensors.
Automate missing data handling in production pipelines to avoid runtime errors.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗