Skip to content Skip to sidebar Skip to footer

Understanding The Execution Of Dataframe In Python

I am new to python and i want to understand how the execution takes place in a DataFrame. let's try this with an example from the dataset found in the kaggle.com(Titanic: Machine L

Solution 1:

It's because of groupby + transform. When you group with an aggregation that returns a scalar per group a normal groupby collapses to a single row for each unique grouping key.

np.random.seed(42)
df = pd.DataFrame({'Sex': list('MFMMFFMMFM'),
                   'Age': np.random.choice([1, 10, 11, 13, np.NaN], 10)},
                   index=list('ABCDEFGHIJ'))
df.groupby('Sex')['Age'].mean()

#Sex#F    10.5                # One F row#M    11.5                # One M row#Name: Age, dtype: float64

Using transform will broadcast this result back to the original index based on the group that row belonged to.

df.groupby('Sex')['Age'].transform('mean')

#A11.5  # BelongedtoM#B10.5  # BelongedtoF#C11.5  # BelongedtoM#D11.5#E10.5#F10.5#G11.5#H11.5#I10.5#J11.5#Name: Age, dtype: float64

To make it crystal clear, I'll assign the transformed result back, and now you can see how .fillna gets the correct mean.

df['Sex_mean'] = df.groupby('Sex')['Age'].transform('mean')

  Sex   Age  Sex_mean
A   M  13.011.5
B   FNaN10.5# NaN will be filled with 10.5
C   M  11.011.5
D   M   NaN11.5# NaN will be filled with 11.5
E   FNaN10.5# Nan will be filled with 10.5FF10.010.5
G   M  11.011.5
H   M  11.011.5
I   F11.010.5
J   M   NaN11.5# Nan will be filled with 11.5

Post a Comment for "Understanding The Execution Of Dataframe In Python"