Understanding The Execution Of Dataframe In Python
I am new to python and i want to understand how the execution takes place in a DataFrame. let's try this with an example from the dataset found in the kaggle.com(Titanic: Machine L
Solution 1:
It's because of groupby
+ transform
. When you group with an aggregation that returns a scalar per group a normal groupby
collapses to a single row for each unique grouping key.
np.random.seed(42)
df = pd.DataFrame({'Sex': list('MFMMFFMMFM'),
'Age': np.random.choice([1, 10, 11, 13, np.NaN], 10)},
index=list('ABCDEFGHIJ'))
df.groupby('Sex')['Age'].mean()
#Sex#F 10.5 # One F row#M 11.5 # One M row#Name: Age, dtype: float64
Using transform
will broadcast this result back to the original index based on the group that row belonged to.
df.groupby('Sex')['Age'].transform('mean')
#A11.5 # BelongedtoM#B10.5 # BelongedtoF#C11.5 # BelongedtoM#D11.5#E10.5#F10.5#G11.5#H11.5#I10.5#J11.5#Name: Age, dtype: float64
To make it crystal clear, I'll assign the transformed result back, and now you can see how .fillna
gets the correct mean.
df['Sex_mean'] = df.groupby('Sex')['Age'].transform('mean')
Sex Age Sex_mean
A M 13.011.5
B FNaN10.5# NaN will be filled with 10.5
C M 11.011.5
D M NaN11.5# NaN will be filled with 11.5
E FNaN10.5# Nan will be filled with 10.5FF10.010.5
G M 11.011.5
H M 11.011.5
I F11.010.5
J M NaN11.5# Nan will be filled with 11.5
Post a Comment for "Understanding The Execution Of Dataframe In Python"