Skip to content Skip to sidebar Skip to footer

How To Fill The Null Values With The Average Of All The Preceeding Values Before Null And First Succeeding Value After Null In Python?

I have a dataframe with 5000 records. I want the null values to be filled with: Average(All the Preceding values before null, First succeeding value after null) data: Date

Solution 1:

The following seems to work. You define an apply function for the rows which modifies the df in place. Each time a row (with null values) is reached you can take an expanding mean of df(see here), using a shift to include the following row. You then use loc to overwrite df with the new values:

def foo(row):
    if any(row.isna()):
        df.loc[row.name,row.isna()] = df.expanding().mean().shift(-1).loc[row.name,:]

Applying:

>>>df.apply(foo,axis=1)gcsCompClayWTSDate2020-01-01  1550.0  41.0000009.41000022.6000002020-01-02  1540.0  48.0000009.50000025.8000002020-01-03  1544.0  43.6666679.40333324.0333332020-01-04  1542.0  42.0000009.30000023.7000002020-01-05  1580.0  48.0000009.10000021.2000002020-01-06  1546.0  43.7777789.45222222.9222222020-01-07  1520.0  40.00000010.00000020.2000002020-01-08  1523.0  30.00000025.00000019.000000

Note that I moved your Date column to be an index. I think the above should work wherever the missing values are, ensuring that the values are filled in from top to bottom.

I'm not sure how it will handle scaling up to 5000 rows, but it seems like you have to use apply or some loop b/c you want to include imputed values in the calculation of future imputed values*. I added the if statement b/c it seemed to speed up the calculation considerably:

%%timeit
df.apply(foo, axis=1)
#1.17 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df.apply(foo_without_if, axis=1)
#16.2 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

*if you don't want to do this (i.e. you can just take the rolling mean but ignore NAs from earlier rows), you can do:

mask = df.isna()
df[mask] = df.expanding().mean()[mask.shift(1)].shift(-1)

Post a Comment for "How To Fill The Null Values With The Average Of All The Preceeding Values Before Null And First Succeeding Value After Null In Python?"