Skip to content Skip to sidebar Skip to footer

Find Percentile In Pandas Dataframe Based On Groups

Season Name value 2001 arkansas 3.497 2002 arkansas 3.0935 2003 arkansas 3.3625 2015 arkansas 3.766 2001 colorado 2.21925 2002 colorado 1.47

Solution 1:

You can use groupby + quantile:

df.groupby('Name')['value'].quantile([.1, .9])

Name
arkansas  0.13.1742000.93.685300
colorado  0.11.6207250.92.656375
Name: value, dtype: float64

And then call np.searchsorted.


Alternatively, use qcut.

df.groupby('Name').apply(lambda x: 
       pd.qcut(x['value'], [.1, .9]))

Namearkansas0     (3.173, 3.685]
          1                NaN
          2     (3.173, 3.685]
          3                NaN
colorado  4      (1.62, 2.656]
          5                NaN
          6                NaN
          7      (1.62, 2.656]
          8      (1.62, 2.656]
          9      (1.62, 2.656]
          10     (1.62, 2.656]
          11     (1.62, 2.656]
Name: value, dtype: object

Solution 2:

If the variable for your dataframe is df, this should work. I'm not sure what you want your output to look like, but I just created code for a dictionary, where each key is a state. Also, since you have very few values, I used the option "nearest" for the argument interpolation (the default value is interpolation). To see the possible options, check out the documentation for the function here.

import pandas as pd
import numpy as np
df = pd.read_csv('stacktest.csv')
#array of unique state names from the dataframe
states = np.unique(df['Name'])
#empty dictionary 
state_data = dict()
for state in states:
    state_data[state] = np.percentile(df[df['Name'] == state]['value'],[10,90],interpolation = 'nearest')
print(state_data)

enter image description here

Post a Comment for "Find Percentile In Pandas Dataframe Based On Groups"