Skip to content Skip to sidebar Skip to footer

Efficient Way To Do The Pandas Dataframe Column Lookup

I have the following dataframe: +-------------------------------------------+----------------------------------------+----------------+----------------------------------+ |

Solution 1:

Solution tested with missing values in some column(s), but in Lookup are not NaNs or Nones:

df = pd.concat([df[x].str.split(',', expand=True).stack() for x in df.columns], axis=1, keys=df.columns)
df = df.reset_index(level=1, drop=True).set_index('Lookup', append=True).unstack().sort_index(axis=1, level=1)
df.columns = [f'{b}_{a}' for a, b in df.columns]

Idea is split each value in loop, explode for Series and concat together, last reshape by stack:

df = pd.concat([df[x].str.split(',').explode() for x in df.columns], axis=1)
df = df.set_index('Lookup', append=True).unstack().sort_index(axis=1, level=1)
df.columns = [f'{b}_{a}'for a, b in df.columns]
print (df)
  100000_LookUp Value 1 100000_LookUp Value 2 100000_LookUp Value 3  \
0                 -1820                     1                  1820   
1                 -1360                     2                  4120   

  1000000_LookUp Value 1 1000000_LookUp Value 2 1000000_LookUp Value 3  \
0                  -1820                      1                   1820   
1                 -28760                      3                  31520   

  200000_LookUp Value 1 200000_LookUp Value 2 200000_LookUp Value 3  \
0                 -1820                     1                  1820   
1                 -1360                     2                  4120   

  300000_LookUp Value 1 300000_LookUp Value 2 300000_LookUp Value 3  \
0                 -1820                     1                  1820   
1                -28760                     3                 31520   

  50000_LookUp Value 1 50000_LookUp Value 2 50000_LookUp Value 3  \
0                -1820                    1                 1820   
1                -1360                    2                 4120   

  500000_LookUp Value 1 500000_LookUp Value 2 500000_LookUp Value 3  
0                 -1820                     1                  1820  
1                -28760                     3                 31520  

Solution 2:

At the core, you can use groupby very well to achieve your goal:

grouped = df.groupby("Lookup")

This is now a dict-like object that has the values you want for every Lookup value in separate dataframes. The question now is how we get it back together again, and here I have to resort to a quite hacky method. I'm sure there are better ones, but this one does produce a nice result.

dflist = []
keylist = []
basecols = df.columns[1:]

for key, df in grouped.__iter__():
    keylist.append(key)
    dflist.append(df[basecols].reset_index(drop=True)

result = pd.concat(dflist, axis=1)
resultcolumns = pd.MultiIndex.from_product([keylist, basecols])
result.columns = resultcolumns

This produces a MultiIndexed DataFrame with the result you described.

Output:

>> result
   500001000002000003000005000001000000
   Value1 Value2 Value3  Value1 Value2 Value3  Value1 Value2 Value3  Value1 Value2 Value3  Value1 Value2 Value3  Value1 Value2 Value3
0   -182011820   -182011820   -182011820   -182011820   -182011820   -1820118201   -136024120   -136024120   -136024120  -28760331520  -28760331520  -28760331520

Post a Comment for "Efficient Way To Do The Pandas Dataframe Column Lookup"