Unwanted White Spaces Resulting Into Distorted Column
I am trying to import a list of chemicals from a txt file which is spaced (not tabbed). NO FORMULA NAME CAS No A B C D TMIN TMAX code ngas@TMIN ngas@25 C ngas@TMAX 1 CBrClF2 bromo
Solution 1:
I would suggest you put some work on the 'trial1.txt' file before loading it to df. The following code will result to what you finally want to get:
withopen ('trial1.txt') as f:
l=f.readlines()
l=[i.split() for i in l]
target=len(l[1])
for i inrange(1,len(l)):
iflen(l[i])>target:
l[i][2]=l[i][2]+' '+l[i][3]
l[i].pop(3)
l=['#'.join(k) for k in l] #supposing that there is no '#' in your entire file, otherwise use some other rare symbol that doesn't eist in your file
l=[i+'\n'for i in l]
withopen ('trial2.txt', 'w') as f:
f.writelines(l)
df = pd.read_csv('trial2.txt', sep='#', index_col=0)
Solution 2:
Try this:
You basically have to strip out the spaces between words in the name column. So here I first read the file and then strip out the spaces in the NAME column using re.sub
.
In this code, I am assuming that words are separated by atleast 5 letters on either sides. You can change that number {5}
as you deem fit.
import re
withopen('trial1.txt', 'r') as f:
lines = f.readlines()
l = [re.sub(r"([a-z]{5,})\s([a-z]{5,})", r"\1\2", line) for line in lines]
df = pd.read_csv(io.StringIO('\n'.join(l)), delim_whitespace=True)
Prints:
NOFORMULANAMECASNoABCDTMINTMAXcodengas@TMINngas@25C.1ngas@TMAX1CBrClF2bromochlorodifluoromethane353-59-3-0.07990.49660-0.000063-9.096100e-092001500 296.65142.14572.33NaNNaN2CBrCl2Fbromodichlorofluoromethane353-58-24.06840.413430.000017-3.438800e-082001500 287.14127.90545.46NaNNaN3CBrCl3bromotrichloromethane75-62-77.37670.350560.000069-4.957100e-082001500 279.86116.73521.53NaNNaN4CBrF3bromotrifluoromethane75-63-8-9.52530.65020-0.0003451.098700e-072301500 1,2123.13156.61561.26NaNNaN5CBr2F2dibromodifluoromethane75-61-62.81670.49405-0.000013-2.862900e-082001500 2100.89148.24618.87NaNNaN6CBr4carbontetrabromide558-13-410.68120.328690.000107-6.078800e-082001500 280.23116.62540.18NaNNaN7CClF3chlorotrifluoromethane75-72-913.80750.47487-0.0001342.248500e-082301500 1,2116.23144.10501.22NaNNaN8CClNcyanogenchloride506-77-40.86650.36619-0.000030-1.319100e-082001500 272.80107.03438.19NaNNaN
Post a Comment for "Unwanted White Spaces Resulting Into Distorted Column"