Pandas: How To Import "horse-colic.data", A Space Separated Data File?
I am trying to import 'data' file horse-colic dataset. the content in the file looks like 2 1 530101 38.50 66 28 3 3 ? 2 5 4 4 ? ? ? 3 5 45.00 8.40 ? ? 2 2 11300 00000 00000 2 1 1
Solution 1:
df.replace("?", np.NaN)
returns a copy of df
with "?"
replaced by NaN
. To change df
itself you'd do df = df.replace("?", np.NaN)
or
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/horse-colic/horse-colic.data', delim_whitespace=True, header=None).replace("?", np.NaN)
Or, as @Vaishali says, add inplace=True
.
I'm not entirely sure what you mean when you say header=None
doesn't work. When I leave it out I get a DataFrame with the first row of the Horse Colic data set as my column names.
Solution 2:
A csv file is one that has each item separated by commas.
I recommend using numpy.genfromtxt
, then converting that into a data frame.
The first bit would go like this:
import numpy as np
data = np.genfromtxt('filename.txt',deletechars='?',filling_values=np.NaN)
and if you need it in a pandas data frame, that is usually a pretty smooth transition
Reference:
Post a Comment for "Pandas: How To Import "horse-colic.data", A Space Separated Data File?"