How to split features and labels in pandas
Quick answer
Use pandas DataFrame indexing to separate features and labels by selecting columns. Typically, features are all columns except the target label column, which you isolate using
df.drop() for features and df['label_column'] for labels.PREREQUISITES
Python 3.8+pip install pandas>=1.0Basic knowledge of pandas DataFrame
Setup
Install pandas if not already installed to handle DataFrame operations.
pip install pandas Step by step
This example shows how to split features and labels from a pandas DataFrame for PyTorch model training.
import pandas as pd
# Sample DataFrame with features and label
data = {
'feature1': [5, 6, 7, 8],
'feature2': [10, 20, 30, 40],
'label': [0, 1, 0, 1]
}
df = pd.DataFrame(data)
# Split features (X) and labels (y)
X = df.drop(columns=['label']) # Features: all columns except 'label'
y = df['label'] # Labels: the 'label' column
print("Features (X):")
print(X)
print("\nLabels (y):")
print(y) output
Features (X): feature1 feature2 0 5 10 1 6 20 2 7 30 3 8 40 Labels (y): 0 0 1 1 2 0 3 1 Name: label, dtype: int64
Common variations
You can split features and labels using column positions or multiple label columns.
import pandas as pd
# Using column positions (iloc) to split
X_pos = df.iloc[:, :-1] # all rows, all columns except last
y_pos = df.iloc[:, -1] # all rows, last column
print("Features using iloc:")
print(X_pos)
print("\nLabels using iloc:")
print(y_pos)
# Multiple label columns example
data_multi_label = {
'feature1': [1, 2, 3],
'feature2': [4, 5, 6],
'label1': [0, 1, 0],
'label2': [1, 0, 1]
}
df_multi = pd.DataFrame(data_multi_label)
X_multi = df_multi.drop(columns=['label1', 'label2'])
y_multi = df_multi[['label1', 'label2']]
print("\nFeatures with multiple labels:")
print(X_multi)
print("\nLabels with multiple columns:")
print(y_multi) output
Features using iloc: feature1 feature2 0 5 10 1 6 20 2 7 30 3 8 40 Labels using iloc: 0 0 1 1 2 0 3 1 Name: label, dtype: int64 Features with multiple labels: feature1 feature2 0 1 4 1 2 5 2 3 6 Labels with multiple columns: label1 label2 0 0 1 1 1 0 2 0 1
Troubleshooting
- If you get a
KeyErrorwhen dropping columns, verify the column names exactly match those in the DataFrame. - Ensure your label column is not included in the features to avoid data leakage during model training.
Key Takeaways
- Use
df.drop(columns=[...])to select feature columns excluding labels. - Access label columns directly with
df['label_column']or multiple labels with a list. - Column position indexing with
ilocis a flexible alternative for splitting. - Always verify column names to avoid
KeyErrorwhen dropping columns. - Keep features and labels separate to prepare data correctly for PyTorch training.