Skip to content Skip to sidebar Skip to footer

Invert Time On Given Date In Dataframe

For a dataframe containing start and end times I would like to 'invert' it's times for a given date. There certainly is a 'brute force' method to do it by looping through the dataf

Solution 1:

You can do this simply enough with shift. The problem comes with the last row that I'm trying to work out how best to reconstruct.

EDIT: I gave it my best shot on the last row but it ends up being a clumsy mess. I'd be happy for any feedback on that last row. In principle, using shift would make this super easy. You could obviously just drop start and end before adding the last row, I just went with showing how to do it with no data loss.

import pandas as pd
import numpy as np
import datetime as dt

df = pd.DataFrame({'date': [dt.date(2019, 4, 4), dt.date(2019, 4, 5), dt.date(2019, 4, 5)],
                   'start': [pd.Timestamp(2019, 4, 4, 10), pd.Timestamp(2019, 4, 5, 0), pd.Timestamp(2019, 4, 5, 14)],
                   'end': [pd.Timestamp(2019, 4, 4, 16), pd.Timestamp(2019, 4, 5, 4), pd.Timestamp(2019, 4, 5, 18)]})

df = df[['date', 'start', 'end']]

saved_shift_ending = df['end'].iloc[-1]  # we want end of last shift
saved_end_date = df['date'].iloc[-1]     # we also want the date value

start_date = df['date'].min()
end_date = (df['date'].max() + dt.timedelta(days=1))

df['other_start'] = df['end'].shift(1)
df['other_end'] = df['start']

df.loc[0, 'other_start'] = start_date # The first value is NaT after shift

last_row = pd.DataFrame([[saved_end_date.strftime('%Y-%m-%d'), 
                         np.nan, 
                         np.nan, 
                         saved_shift_ending, 
                         end_date]],
                        columns=['date', 'start', 'end', 'other_start',
                                'other_end'])

df = df.append(last_row)

df.drop(['start', 'end'], axis=1, inplace=True)
print(df)

Solution 2:

roganjosh answers the general case, however I had to look at daily "free time" and for that I had to add in some artificial date boundaries as rows with zero time between start and end. In the end .shift() was what I was after. I packed it into a function to increase reusability and if anyone has a more elegant solution please feel free to share.

Here is my code:

definvertDailyTimes(df, dateCol, starttimeCol, endtimeCol):
    """
    requires a input df with a date column (dateCol) and two timestamp columns (starttimeCol, endttimeCol)
    which is monotonic ordered in (starttimeCol, endttimeCol)
    """
    dates = list(df[dateCol].unique())
    for d in dates:
        df_tmp = df[df[dateCol] == d].iloc[0:1]
        df_tmp[starttimeCol] = pd.Timestamp(d)
        df_tmp[endtimeCol] = pd.Timestamp(d)
        df_tmp = df_tmp.append(df_tmp)
        df_tmp[starttimeCol].iloc[-1] = pd.Timestamp(d + datetime.timedelta(days=1))
        df_tmp[endtimeCol].iloc[-1] = pd.Timestamp(d + datetime.timedelta(days=1))
        df_tmp[dateCol].iloc[-1] = d + datetime.timedelta(days=1)
        df = df.append(df_tmp)

    df.drop_duplicates(inplace=True)
    df.sort_values(by=[starttimeCol, endtimeCol], inplace=True)

    df['invert_start'] = df[endtimeCol].shift(1)
    df['invert_end'] = df[starttimeCol]

    df = df[(abs(df['invert_start'] - df['invert_end']) < pd.Timedelta(days=1)) &
            (abs(df['invert_start'] - df['invert_end']) > pd.Timedelta(seconds=0))]

    df[starttimeCol] = df['invert_start']
    df[endtimeCol] = df['invert_end']
    df.drop(columns=['invert_start', 'invert_end'], inplace=True)

    return df

Post a Comment for "Invert Time On Given Date In Dataframe"