Sort Pandas Dataframe By String Column That Represents (mostly) Numbers?
I have data similar to this. data = [ dict(name = 'test1', index = '1' , status='fail'), dict(name = 'test3', index = '3', status='pass'), dict(name = 'test1', index = '11', status
Solution 1:
This will sort by the name and a temporary column (__ix
) that is the first integer found (consecutive digits) in each 'index'
string:
Update: You can also use:
df = (
df
.assign(
__ix=df['index'].str.extract(r'([0-9]+)').astype(int)
)
.sort_values(['name', '__ix'])
.drop('__ix', axis=1) # optional: remove the tmp column
.reset_index(drop=True) # optional: leaves the index scrambled
)
Original:
df = (
df
.assign(
__ix=df['index']
.apply(lambda s: int(re.match(r'\D*(\d+)', s).group(0)))
)
.sort_values(['name', '__ix'])
.drop('__ix', axis=1)
.reset_index(drop=True)
)
On your data (thanks for providing an easy reproducible example), first check what that __ix
column is:
df['index'].apply(lambda s: int(re.match(r'\D*(\d+)', s).group(0)))
# out:0113211314205265
After sorting, your df becomes:
name index status
0 test1 1 fail
1 test1 121456 fail
2 test1 2 fail
3 test1 11pass4 test3 3pass5 test3 5:1:50pass6 test3 20 fail
Solution 2:
One possibility is to make a column that will give you the length of the index.
df['sort'] = df['index'].str.len()
df['sort2'] = df['index'].str[0]
df1 = df.sort_values(by=['name','sort','sort2'])
df1 = df1.drop(columns = ['sort','sort2'])
Post a Comment for "Sort Pandas Dataframe By String Column That Represents (mostly) Numbers?"