Skip to content Skip to sidebar Skip to footer

Splitting A Column In A Dataframe Based On Multiple Possible Delimiters

I have an address column in a dataframe in pandas with 3 types of information namely street, colony and city. There are three values with two possible delimiters - either a ',' or

Solution 1:

If you are certain it is either a comma , or a whitespace you could use:

df[['Street','Colony','City']] = df.address.str.split('[ ,]', expand=True)

Explanation:str.split accepts a pat (pattern) parameter: String or regular expression to split on. If not specified, split on whitespace. Using the fact we can pass a regular expression this becomes an easy task as [ ,] in regex means either or ,.

An alternative would be to use ' |,' or if you can have multiple whitespace '\s+|,'


Full example:

import pandas as pd

df = pd.DataFrame({
    'address': ['a,b,c','a b c']
})

df[['Street','Colony','City']] = df.address.str.split('[ ,]', expand=True)

print(df)

Returns:

address Street Colony City
0a,b,c      ab    c
1ab c      ab    c

Solution 2:

Try this

df[['Street','Colony','City']] = df.address.apply(lambda x: pd.Series(re.split('\W',x)))

\W will match any character which is not word character. See docs

Solution 3:

One way to accomplish this would be to use re.sub to consolidate your delimiters, then use str.split on that single delimiter to create your new columns.

import pandas as pd 
import re

df = pd.DataFrame({'address':['Street1,Colony1,City1',  'Street2 Colony2 City2']})

location_df = (df.address
                 .apply(lambda x: pd.Series(re.sub(pattern=' |,', 
                                                   repl=',', 
                                                   string=x).split(','), 
                                            index=['street','colony','city']))
                )

Post a Comment for "Splitting A Column In A Dataframe Based On Multiple Possible Delimiters"