Linearly Interpolate Missing Rows In Pandas Dataframe
I have the foll. dataframe: Value ts year JD check group_id 0 NaN 950832000 2000 49 NaN 19987 1
Solution 1:
You could convert your JD
values to a DateTimeIndex
and resample
to daily frequency ((see docs). pandas.Series.interpolate()
will then fill in the missing values between existing values in the Value
columns as follows:
start=date(2000,1,1)df.index=pd.DatetimeIndex(df.JD.apply(lambdax:start+relativedelta(days=int(x)-1)))df=df.resample('D')df.loc[:, ['Value', 'JD']]=df.loc[:, ['Value', 'JD']].interpolate(method='linear',limit_direction='both',limit=100)df.tail(25)ValuetsyearJDcheckgroup_id2000-11-24 0.333167NaNNaN329NaNNaN2000-11-25 0.333620NaNNaN330NaNNaN2000-11-26 0.334074NaNNaN331NaNNaN2000-11-27 0.334527NaNNaN332NaNNaN2000-11-28 0.334980NaNNaN333NaNNaN2000-11-29 0.335434NaNNaN334NaNNaN2000-11-30 0.335887NaNNaN335NaNNaN2000-12-01 0.336341NaNNaN336NaNNaN2000-12-02 0.3367949757152002000 3371199872000-12-03 0.337247NaNNaN338NaNNaN2000-12-04 0.337701NaNNaN339NaNNaN2000-12-05 0.338154NaNNaN340NaNNaN2000-12-06 0.338608NaNNaN341NaNNaN2000-12-07 0.339061NaNNaN342NaNNaN2000-12-08 0.339514NaNNaN343NaNNaN2000-12-09 0.339968NaNNaN344NaNNaN2000-12-10 0.340421NaNNaN345NaNNaN2000-12-11 0.340875NaNNaN346NaNNaN2000-12-12 0.341328NaNNaN347NaNNaN2000-12-13 0.341782NaNNaN348NaNNaN2000-12-14 0.342235NaNNaN349NaNNaN2000-12-15 0.342688NaNNaN350NaNNaN2000-12-16 0.343142NaNNaN351NaNNaN2000-12-17 0.343595NaNNaN352NaNNaN2000-12-18 0.3440499770976002000 353119987
You will notice that .interpolate()
only backfills missing values at the beginning of the series, which is due to the scipy.interp1d
behavior for bound_error
as described [in the scipy docs].2
Post a Comment for "Linearly Interpolate Missing Rows In Pandas Dataframe"