Skip to content Skip to sidebar Skip to footer

Pandas: Create Column Id Based On Intersections On Rows

I have a pandas DataFrame as follows: and I need to create a new column ID taking into consideration all the intersections between values in columns id1, id2 and id3. The output r

Solution 1:

Use DataFrame.melt for unpivot for possible pass 2 columns to convert_matrix.from_pandas_edgelist and get all connected_components for dicionary, last use Series.map for new column:

df1 = df.melt(id_vars='id1', value_vars=['id2','id3'])

import networkx as nx

# Create the graph from the dataframe
g = nx.Graph()
g = nx.from_pandas_edgelist(df1,'id1','value')

connected_components = nx.connected_components(g)

# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
    for node in component:
        node2id[node] = cid + 1

df['g'] = df['id1'].map(node2id)
print (df)
  id1 id2 id3  g
0   a   x   u  1
1   a   y   j  1
2   b   x   t  1
3   c   z   r  2
4   d   p   r  2

Post a Comment for "Pandas: Create Column Id Based On Intersections On Rows"