Invert Time On Given Date In Dataframe
Solution 1:
You can do this simply enough with shift
. The problem comes with the last row that I'm trying to work out how best to reconstruct.
EDIT: I gave it my best shot on the last row but it ends up being a clumsy mess. I'd be happy for any feedback on that last row. In principle, using shift
would make this super easy. You could obviously just drop start
and end
before adding the last row, I just went with showing how to do it with no data loss.
import pandas as pd
import numpy as np
import datetime as dt
df = pd.DataFrame({'date': [dt.date(2019, 4, 4), dt.date(2019, 4, 5), dt.date(2019, 4, 5)],
'start': [pd.Timestamp(2019, 4, 4, 10), pd.Timestamp(2019, 4, 5, 0), pd.Timestamp(2019, 4, 5, 14)],
'end': [pd.Timestamp(2019, 4, 4, 16), pd.Timestamp(2019, 4, 5, 4), pd.Timestamp(2019, 4, 5, 18)]})
df = df[['date', 'start', 'end']]
saved_shift_ending = df['end'].iloc[-1] # we want end of last shift
saved_end_date = df['date'].iloc[-1] # we also want the date value
start_date = df['date'].min()
end_date = (df['date'].max() + dt.timedelta(days=1))
df['other_start'] = df['end'].shift(1)
df['other_end'] = df['start']
df.loc[0, 'other_start'] = start_date # The first value is NaT after shift
last_row = pd.DataFrame([[saved_end_date.strftime('%Y-%m-%d'),
np.nan,
np.nan,
saved_shift_ending,
end_date]],
columns=['date', 'start', 'end', 'other_start',
'other_end'])
df = df.append(last_row)
df.drop(['start', 'end'], axis=1, inplace=True)
print(df)
Solution 2:
roganjosh answers the general case, however I had to look at daily "free time" and for that I had to add in some artificial date boundaries as rows with zero time between start and end. In the end .shift()
was what I was after.
I packed it into a function to increase reusability and if anyone has a more elegant solution please feel free to share.
Here is my code:
definvertDailyTimes(df, dateCol, starttimeCol, endtimeCol):
"""
requires a input df with a date column (dateCol) and two timestamp columns (starttimeCol, endttimeCol)
which is monotonic ordered in (starttimeCol, endttimeCol)
"""
dates = list(df[dateCol].unique())
for d in dates:
df_tmp = df[df[dateCol] == d].iloc[0:1]
df_tmp[starttimeCol] = pd.Timestamp(d)
df_tmp[endtimeCol] = pd.Timestamp(d)
df_tmp = df_tmp.append(df_tmp)
df_tmp[starttimeCol].iloc[-1] = pd.Timestamp(d + datetime.timedelta(days=1))
df_tmp[endtimeCol].iloc[-1] = pd.Timestamp(d + datetime.timedelta(days=1))
df_tmp[dateCol].iloc[-1] = d + datetime.timedelta(days=1)
df = df.append(df_tmp)
df.drop_duplicates(inplace=True)
df.sort_values(by=[starttimeCol, endtimeCol], inplace=True)
df['invert_start'] = df[endtimeCol].shift(1)
df['invert_end'] = df[starttimeCol]
df = df[(abs(df['invert_start'] - df['invert_end']) < pd.Timedelta(days=1)) &
(abs(df['invert_start'] - df['invert_end']) > pd.Timedelta(seconds=0))]
df[starttimeCol] = df['invert_start']
df[endtimeCol] = df['invert_end']
df.drop(columns=['invert_start', 'invert_end'], inplace=True)
return df
Post a Comment for "Invert Time On Given Date In Dataframe"