Skip to content Skip to sidebar Skip to footer

Normalizing Data By Duplication

note: this question is indeed a duplicate of Split pandas dataframe string entry to separate rows, but the answer provided here is more generic and informative, so with all respect

Solution 1:

try this:

In [44]: df
Out[44]:
        id  value
0        a    1561      b,c    4572  e,g,f,h    346

In [45]: (df['id'].str.split(',', expand=True)
   ....:          .stack()
   ....:          .reset_index(level=0)
   ....:          .set_index('level_0')
   ....:          .rename(columns={0:'id'})
   ....:          .join(df.drop('id',1), how='left')
   ....: )
Out[45]:
  id  value
0  a    1561  b    4571  c    4572  e    3462  g    3462  f    3462  h    346

Explanation:

In[48]: df['id'].str.split(',', expand=True).stack()
Out[48]:
00a10b1c20e1g2f3hdtype: objectIn[49]: df['id'].str.split(',', expand=True).stack().reset_index(level=0)
Out[49]:
   level_0000a01b11c02e12g22f32hIn[50]: df['id'].str.split(',', expand=True).stack().reset_index(level=0).set_index('level_0')
Out[50]:
         0level_00a1b1c2e2g2f2hIn[51]: df['id'].str.split(',', expand=True).stack().reset_index(level=0).set_index('level_0').rename(columns={0:'id'})
Out[51]:
        idlevel_00a1b1c2e2g2f2hIn[52]: df.drop('id',1)
Out[52]:
   value015614572346

Post a Comment for "Normalizing Data By Duplication"