Skip to content Skip to sidebar Skip to footer

Cartesian Product Of A Pandas Dataframe With Itself

Given a dataframe: id value 0 1 a 1 2 b 2 3 c I want to get a new dataframe that is basically the cartesian product of each row with each other row exclu

Solution 1:

We want to get the indices for the upper and lower triangles of a square matrix. Or in other words, where the identity matrix is zero

np.eye(len(df))

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

So I subtract it from 1 and

array([[ 0.,  1.,  1.],
       [ 1.,  0.,  1.],
       [ 1.,  1.,  0.]])

In a boolean context and passed to np.where I get exactly the upper and lower triangle indices.

i, j = np.where(1- np.eye(len(df)))
df.iloc[i].reset_index(drop=True).join(
    df.iloc[j].reset_index(drop=True), rsuffix='_2')

   id value  id_2 value_2
01     a     2       b
11     a     3       c
22     b     1       a
32     b     3       c
43     c     1       a
53     c     2       b

Solution 2:

I had this problem before , this is my solution ..

import itertools
import pandas as pd 
c = list(itertools.product(DF.id.tolist(), DF.id.tolist()))
Dic=dict(zip(DF.id, DF.value))
df = pd.DataFrame(c , columns=['id', 'id_2'])
df[['value','value_2']]=df.apply(lambda x:x.map(Dic))
df.loc[df.id!=df.id_2,:]


Out[125]: 
   id  id_2 value value_2
112     a       b
213     a       c
321     b       a
523     b       c
631     c       a
732     c       b

Solution 3:

This can be done entirely in pandas:

df.loc[:, 'key_col'] =1 # create a joincolumn that will give us the Cartesian Product

(df.merge(df, df, on='key_col', suffixes=('', '_2'))
 .query('id != id_2') # filterout joins on the same row
 .drop('key_col', axis=1)
 .reset_index(drop=True))

Or if you don't want to have to drop the dummy column, you can temporarily create it when calling df.merge:

(df.merge(df, on=df.assign(key_col=1)['key_col'], suffixes=('', '_2'))
 .query('id != id_2') # filterout joins on the same row
 .reset_index(drop=True))

Post a Comment for "Cartesian Product Of A Pandas Dataframe With Itself"