Skip to content Skip to sidebar Skip to footer

Linearly Interpolate Missing Rows In Pandas Dataframe

I have the foll. dataframe: Value ts year JD check group_id 0 NaN 950832000 2000 49 NaN 19987 1

Solution 1:

You could convert your JD values to a DateTimeIndex and resample to daily frequency ((see docs). pandas.Series.interpolate() will then fill in the missing values between existing values in the Value columns as follows:

start=date(2000,1,1)df.index=pd.DatetimeIndex(df.JD.apply(lambdax:start+relativedelta(days=int(x)-1)))df=df.resample('D')df.loc[:, ['Value', 'JD']]=df.loc[:, ['Value', 'JD']].interpolate(method='linear',limit_direction='both',limit=100)df.tail(25)ValuetsyearJDcheckgroup_id2000-11-24  0.333167NaNNaN329NaNNaN2000-11-25  0.333620NaNNaN330NaNNaN2000-11-26  0.334074NaNNaN331NaNNaN2000-11-27  0.334527NaNNaN332NaNNaN2000-11-28  0.334980NaNNaN333NaNNaN2000-11-29  0.335434NaNNaN334NaNNaN2000-11-30  0.335887NaNNaN335NaNNaN2000-12-01  0.336341NaNNaN336NaNNaN2000-12-02  0.3367949757152002000  3371199872000-12-03  0.337247NaNNaN338NaNNaN2000-12-04  0.337701NaNNaN339NaNNaN2000-12-05  0.338154NaNNaN340NaNNaN2000-12-06  0.338608NaNNaN341NaNNaN2000-12-07  0.339061NaNNaN342NaNNaN2000-12-08  0.339514NaNNaN343NaNNaN2000-12-09  0.339968NaNNaN344NaNNaN2000-12-10  0.340421NaNNaN345NaNNaN2000-12-11  0.340875NaNNaN346NaNNaN2000-12-12  0.341328NaNNaN347NaNNaN2000-12-13  0.341782NaNNaN348NaNNaN2000-12-14  0.342235NaNNaN349NaNNaN2000-12-15  0.342688NaNNaN350NaNNaN2000-12-16  0.343142NaNNaN351NaNNaN2000-12-17  0.343595NaNNaN352NaNNaN2000-12-18  0.3440499770976002000  353119987

You will notice that .interpolate() only backfills missing values at the beginning of the series, which is due to the scipy.interp1d behavior for bound_error as described [in the scipy docs].2

Post a Comment for "Linearly Interpolate Missing Rows In Pandas Dataframe"