Skip to content Skip to sidebar Skip to footer

Calculating The Number Of Years In A Pandas Dataframe

I've written a messy function which calculates the number of years in a dataframe based on it's length (assuming the dataframe has values for each day of the year). It works fine,

Solution 1:

It seems more reasonable to just access the year attribute and then just get the len of the unique values:

In [2]:
s = pd.date_range(dt.datetime(1900,1,1), end=dt.datetime(2000,1,1), freq='6M')
s

Out[2]:
DatetimeIndex(['1900-01-31', '1900-07-31', '1901-01-31', '1901-07-31',
               '1902-01-31', '1902-07-31', '1903-01-31', '1903-07-31',
               '1904-01-31', '1904-07-31',
               ...
               '1995-01-31', '1995-07-31', '1996-01-31', '1996-07-31',
               '1997-01-31', '1997-07-31', '1998-01-31', '1998-07-31',
               '1999-01-31', '1999-07-31'],
              dtype='datetime64[ns]', length=200, freq='6M')

In [8]:
len(np.unique(s.year))

Out[8]:
100

this way it handles infrequent periods, missing days, entries that overlap year boundaries etc.

You can also convert the index to a Series and call nunique:

In [11]:
s.to_series().dt.year.nunique()

Out[11]:
100

Seeing as you already have datetime as a column then just this will work:

df['date_col'].dt.year.nunique()

If necessary you can convert to datetime using:

df['date_col'] = pd.to_datetime(df['date_col'])

UPDATE

so it seems your requirement is to count complete years, if you set the index to the year and day component then you can count at the year level and then filter the rows that are not >= 365 to give you the number of complete years:

In [34]:df=pd.DataFrame({'date':pd.date_range(dt.datetime(1900,6,1),end=dt.datetime(1910,6,1))})count=df.set_index([df['date'].dt.year,df['date'].dt.day]).count(level=0)countOut[34]:datedate1900   2141901   3651902   3651903   3651904   3661905   3651906   3651907   3651908   3661909   3651910   152In [39]:len(count[count>=365].dropna())Out[39]:9

Solution 2:

Just divide the length of the index by 365

defnumyears(x):
     returnlen(x.index) / 365# use // for python 3

As written this will do integer division and match the output you have here. This will scale a dataframe of any size

EDIT The above solution takes a floor approach, after re-reading the post, the OP used a ceiling approach. For that

returnlen(x.index) / 365 + 1

Solution 3:

You can create a set out of all the years, and then return its length.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(365*10, 3), 
                  columns=list('ABC'), 
                  index=pd.date_range('2000-1-1', periods=365*10))

# Number of unique years>>> len(set(ts.year for ts in df.index))
10

If your index is of type datetime or string, you first need to convert it to a Timestamp:

>>>len(set(ts.year for ts in pd.to_datetime(df.index)))
10

Post a Comment for "Calculating The Number Of Years In A Pandas Dataframe"