Replacing Newlines With Spaces For Str Columns Through Pandas Dataframe
Given an example dataframe with the 2nd and 3rd columns of free text, e.g. >>> import pandas as pd >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\
Solution 1:
Use replace
- first first and last strip and then replace \n
:
df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n', ' ', regex=True)
print (df)
0123012 abc foo bar
131defhaha love it
Solution 2:
You can select_dtypes
to select columns of type object
and use applymap
on those columns.
Because there is no inplace
argument for these functions, this would be a workaround to make change to the dataframe:
strs = lol.select_dtypes(include=['object']).applymap(lambdax: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
# 0 1 2 3#0 1 2 abc foo bar#1 3 1 def haha love it
Solution 3:
Adding to the other nice answers, this is a vectorized version of your initial idea:
columns = [2,3]
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ')
forcolin columns]
Details:
In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ')
for col in columns]
In [50]: df
Out[50]:
0123012 abc def haha
131 foo bar love it
Solution 4:
You may use the following two regex replace approach:
>>>df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)>>>df
0 1 2 3
0 1 2 abc foo bar
1 3 1 def haha love it
>>>
Details
'\A\s+|\s+\Z'
->''
will act likestrip()
removing all leading and trailing whitespace:\A\s+
- matches 1 or more whitespace symbols at the start of the string|
- or\s+\Z
- matches 1 or more whitespace symbols at the end of the string
'\n'
->' '
will replace any newline with a space.
Post a Comment for "Replacing Newlines With Spaces For Str Columns Through Pandas Dataframe"