Skip to content Skip to sidebar Skip to footer

Parse Multiple Date Formats Into A Single Format

I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-

Solution 1:

First was added 2 new formats to fmt list:

fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
     '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month',
     'month: %d;Year','%Y','%b %d %Y','%b %Y %d',
     '%Y %b %d', '%Y %b']

Then in list comprehension convert column to datetimes, parameter errors='coerce' is for non matched values to missing values. Last join together by concat.

Last because possible multiple values per rows because dd/mm/YYYY vs mm/dd/YYYY formats (not sure if month of day) is used back filling with select first column. It means what format is first in list it is selected with high priority.

dfs = [pd.to_datetime(df['publish_time'], format=f, errors='coerce') for f infmt]
df['publish_time1']= pd.concat(dfs, axis=1).bfill(axis=1).iloc[:, 0]

print(df)publish_timepublish_time102014 Jul222014-07-2212003 Aug2003-08-0122019 Nov262019-11-2632012-12-07    2012-12-0742020 Jan212020-01-2152015-01-01    2015-01-0162010-11-30    2010-11-3072007-05-10    2007-05-1082020    2020-01-0192012-02-29    2012-02-29102016 Apr192016-04-19112006-12-31    2006-12-31122013 Jun272013-06-27132019 Jun192019-06-19142015 Jun122015-06-12152006 Jun-DecNaT162006-07-31    2006-07-3117NaNNaT182017-04-15    2017-04-15192016 May222016-05-22202020 Feb2020-02-01212017 May62017-05-06222020 Mar112020-03-11232013-04-30    2013-04-30242020-03-07    2020-03-0725NaNNaT262018    2018-01-01

Post a Comment for "Parse Multiple Date Formats Into A Single Format"