Skip to content Skip to sidebar Skip to footer

Python Dataframe Drop Bad Lists Using Numpy Logical Operations

I have a data frame filled with lists. I want to extract min and max from each and drop the ones below a specified value. I written a function in which the first part is completed

Solution 1:

Instead of iterating through rows and analyzing lists, use explode, groupby, and vectorization to test everything without iterating. Here's one way to do it:

df = pd.DataFrame({'x':[[-1,0,1,2,10],[1.5,2,4,5]],'y':[[2.5,2.4,2.3,1.5,0.1],[5,4.5,3,-0.1]]})

for col in ['x', 'y']:
    dfe = df[[col]].explode(col).reset_index()

    dfe_min = dfe.groupby('index')[col].min().reset_index()
    dfe_max = dfe.groupby('index')[col].max().reset_index()
    dfe_min = dfe_min.rename(columns={col:col + '_min'})
    dfe_max = dfe_max.rename(columns={col:col + '_max'})
    dfe_min = dfe_min.merge(dfe_max, on='index', how='left')

    df = df.join(dfe_min)
    del df['index']

to get

                   x                          y  x_min  x_max  y_min  y_max
0[-1, 0, 1, 2, 10][2.5, 2.4, 2.3, 1.5, 0.1]   -1.0100.12.51[1.5, 2, 4, 5][5, 4.5, 3, -0.1]1.55   -0.15.0

Then filter all rows by min & max

# figure out what values you want to require
value_a, value_b, value_c, value_d = 0, -1, 1, 1
df = df[(df['x_min'] > value_a) & (df['y_min'] > value_b) & (df['x_max'] > value_c) & (df['y_max'] > value_d)]

to get

                x                  y  x_min  x_max  y_min  y_max
1[1.5, 2, 4, 5][5, 4.5, 3, -0.1]1.55   -0.15.0

Post a Comment for "Python Dataframe Drop Bad Lists Using Numpy Logical Operations"