Cartesian Product Of A Pandas Dataframe With Itself
Given a dataframe: id value 0 1 a 1 2 b 2 3 c I want to get a new dataframe that is basically the cartesian product of each row with each other row exclu
Solution 1:
We want to get the indices for the upper and lower triangles of a square matrix. Or in other words, where the identity matrix is zero
np.eye(len(df))
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
So I subtract it from 1 and
array([[ 0., 1., 1.],
[ 1., 0., 1.],
[ 1., 1., 0.]])
In a boolean context and passed to np.where
I get exactly the upper and lower triangle indices.
i, j = np.where(1- np.eye(len(df)))
df.iloc[i].reset_index(drop=True).join(
df.iloc[j].reset_index(drop=True), rsuffix='_2')
id value id_2 value_2
01 a 2 b
11 a 3 c
22 b 1 a
32 b 3 c
43 c 1 a
53 c 2 b
Solution 2:
I had this problem before , this is my solution ..
import itertools
import pandas as pd
c = list(itertools.product(DF.id.tolist(), DF.id.tolist()))
Dic=dict(zip(DF.id, DF.value))
df = pd.DataFrame(c , columns=['id', 'id_2'])
df[['value','value_2']]=df.apply(lambda x:x.map(Dic))
df.loc[df.id!=df.id_2,:]
Out[125]:
id id_2 value value_2
112 a b
213 a c
321 b a
523 b c
631 c a
732 c b
Solution 3:
This can be done entirely in pandas:
df.loc[:, 'key_col'] =1 # create a joincolumn that will give us the Cartesian Product
(df.merge(df, df, on='key_col', suffixes=('', '_2'))
.query('id != id_2') # filterout joins on the same row
.drop('key_col', axis=1)
.reset_index(drop=True))
Or if you don't want to have to drop the dummy column, you can temporarily create it when calling df.merge
:
(df.merge(df, on=df.assign(key_col=1)['key_col'], suffixes=('', '_2'))
.query('id != id_2') # filterout joins on the same row
.reset_index(drop=True))
Post a Comment for "Cartesian Product Of A Pandas Dataframe With Itself"