Python Dataframe Drop Bad Lists Using Numpy Logical Operations
I have a data frame filled with lists. I want to extract min and max from each and drop the ones below a specified value. I written a function in which the first part is completed
Solution 1:
Instead of iterating through rows and analyzing lists, use explode, groupby, and vectorization to test everything without iterating. Here's one way to do it:
df = pd.DataFrame({'x':[[-1,0,1,2,10],[1.5,2,4,5]],'y':[[2.5,2.4,2.3,1.5,0.1],[5,4.5,3,-0.1]]})
for col in ['x', 'y']:
dfe = df[[col]].explode(col).reset_index()
dfe_min = dfe.groupby('index')[col].min().reset_index()
dfe_max = dfe.groupby('index')[col].max().reset_index()
dfe_min = dfe_min.rename(columns={col:col + '_min'})
dfe_max = dfe_max.rename(columns={col:col + '_max'})
dfe_min = dfe_min.merge(dfe_max, on='index', how='left')
df = df.join(dfe_min)
del df['index']
to get
x y x_min x_max y_min y_max
0[-1, 0, 1, 2, 10][2.5, 2.4, 2.3, 1.5, 0.1] -1.0100.12.51[1.5, 2, 4, 5][5, 4.5, 3, -0.1]1.55 -0.15.0
Then filter all rows by min & max
# figure out what values you want to require
value_a, value_b, value_c, value_d = 0, -1, 1, 1
df = df[(df['x_min'] > value_a) & (df['y_min'] > value_b) & (df['x_max'] > value_c) & (df['y_max'] > value_d)]
to get
x y x_min x_max y_min y_max
1[1.5, 2, 4, 5][5, 4.5, 3, -0.1]1.55 -0.15.0
Post a Comment for "Python Dataframe Drop Bad Lists Using Numpy Logical Operations"