Calculating The Number Of Years In A Pandas Dataframe
Solution 1:
It seems more reasonable to just access the year
attribute and then just get the len
of the unique values:
In [2]:
s = pd.date_range(dt.datetime(1900,1,1), end=dt.datetime(2000,1,1), freq='6M')
s
Out[2]:
DatetimeIndex(['1900-01-31', '1900-07-31', '1901-01-31', '1901-07-31',
'1902-01-31', '1902-07-31', '1903-01-31', '1903-07-31',
'1904-01-31', '1904-07-31',
...
'1995-01-31', '1995-07-31', '1996-01-31', '1996-07-31',
'1997-01-31', '1997-07-31', '1998-01-31', '1998-07-31',
'1999-01-31', '1999-07-31'],
dtype='datetime64[ns]', length=200, freq='6M')
In [8]:
len(np.unique(s.year))
Out[8]:
100
this way it handles infrequent periods, missing days, entries that overlap year boundaries etc.
You can also convert the index to a Series
and call nunique
:
In [11]:
s.to_series().dt.year.nunique()
Out[11]:
100
Seeing as you already have datetime as a column then just this will work:
df['date_col'].dt.year.nunique()
If necessary you can convert to datetime using:
df['date_col'] = pd.to_datetime(df['date_col'])
UPDATE
so it seems your requirement is to count complete years, if you set the index to the year and day component then you can count at the year level and then filter the rows that are not >= 365 to give you the number of complete years:
In [34]:df=pd.DataFrame({'date':pd.date_range(dt.datetime(1900,6,1),end=dt.datetime(1910,6,1))})count=df.set_index([df['date'].dt.year,df['date'].dt.day]).count(level=0)countOut[34]:datedate1900 2141901 3651902 3651903 3651904 3661905 3651906 3651907 3651908 3661909 3651910 152In [39]:len(count[count>=365].dropna())Out[39]:9
Solution 2:
Just divide the length of the index by 365
defnumyears(x):
returnlen(x.index) / 365# use // for python 3
As written this will do integer division and match the output you have here. This will scale a dataframe of any size
EDIT The above solution takes a floor approach, after re-reading the post, the OP used a ceiling approach. For that
returnlen(x.index) / 365 + 1
Solution 3:
You can create a set out of all the years, and then return its length.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(365*10, 3),
columns=list('ABC'),
index=pd.date_range('2000-1-1', periods=365*10))
# Number of unique years>>> len(set(ts.year for ts in df.index))
10
If your index is of type datetime or string, you first need to convert it to a Timestamp:
>>>len(set(ts.year for ts in pd.to_datetime(df.index)))
10
Post a Comment for "Calculating The Number Of Years In A Pandas Dataframe"