Normalizing Data By Duplication
note: this question is indeed a duplicate of Split pandas dataframe string entry to separate rows, but the answer provided here is more generic and informative, so with all respect
Solution 1:
try this:
In [44]: df
Out[44]:
id value
0 a 1561 b,c 4572 e,g,f,h 346
In [45]: (df['id'].str.split(',', expand=True)
....: .stack()
....: .reset_index(level=0)
....: .set_index('level_0')
....: .rename(columns={0:'id'})
....: .join(df.drop('id',1), how='left')
....: )
Out[45]:
id value
0 a 1561 b 4571 c 4572 e 3462 g 3462 f 3462 h 346
Explanation:
In[48]: df['id'].str.split(',', expand=True).stack()
Out[48]:
00a10b1c20e1g2f3hdtype: objectIn[49]: df['id'].str.split(',', expand=True).stack().reset_index(level=0)
Out[49]:
level_0000a01b11c02e12g22f32hIn[50]: df['id'].str.split(',', expand=True).stack().reset_index(level=0).set_index('level_0')
Out[50]:
0level_00a1b1c2e2g2f2hIn[51]: df['id'].str.split(',', expand=True).stack().reset_index(level=0).set_index('level_0').rename(columns={0:'id'})
Out[51]:
idlevel_00a1b1c2e2g2f2hIn[52]: df.drop('id',1)
Out[52]:
value015614572346
Post a Comment for "Normalizing Data By Duplication"