Skip to content Skip to sidebar Skip to footer

Choosing The Minumum Distance

I have the following dataframe: data = {'id': [0, 0, 0, 0, 0, 0], 'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01

Solution 1:

I am not sure about the format of the expected output, but I would try to bring the result to a point where you can extract data as you like:

Loading given data:

import pandas as pd
data = {'id': [0, 0, 0, 0, 0, 0],
'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01-03 00:07:00', '2019-01-03 00:10:00']}

df_data = pd.DataFrame(data)

df_data['time_order'] = pd.to_datetime(df_data['time_order'])
df_data['day_order'] = df_data['time_order'].dt.strftime('%Y-%m-%d')
df_data['time'] = df_data['time_order'].dt.strftime('%H:%M:%S') 

Calculating difference:

x = '00:00:00'y = '00:15:00'diff = (pd.Timedelta(y)-pd.Timedelta(x))/2

Creating a new column 'diff' as timedelta:

df_data['diff'] = abs(df_data['time'] - diff)

Grouping (based on date) and apply:

mins = df_data.groupby('day_order').apply(lambda x: x[x['diff']==min(x['diff'])])

Removing Index (optional):

mins.reset_index(drop=True, inplace=True)

Output DataFrame:

>>>minsidtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:03:30102019-01-02 00:04:00  2019-01-02  00:04:000days00:03:30202019-01-03 00:07:00  2019-01-03  00:07:000days00:00:30

Making list of difference in seconds:

a = list(mins['diff'].apply(lambda x:x.seconds))

Output:

>>> a[210, 210, 30]

Post a Comment for "Choosing The Minumum Distance"