Skip to content Skip to sidebar Skip to footer

Pandas: How To Import "horse-colic.data", A Space Separated Data File?

I am trying to import 'data' file horse-colic dataset. the content in the file looks like 2 1 530101 38.50 66 28 3 3 ? 2 5 4 4 ? ? ? 3 5 45.00 8.40 ? ? 2 2 11300 00000 00000 2 1 1

Solution 1:

df.replace("?", np.NaN) returns a copy of df with "?" replaced by NaN. To change df itself you'd do df = df.replace("?", np.NaN) or

df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/horse-colic/horse-colic.data', delim_whitespace=True, header=None).replace("?", np.NaN)

Or, as @Vaishali says, add inplace=True.

I'm not entirely sure what you mean when you say header=None doesn't work. When I leave it out I get a DataFrame with the first row of the Horse Colic data set as my column names.

Solution 2:

A csv file is one that has each item separated by commas. I recommend using numpy.genfromtxt, then converting that into a data frame.

The first bit would go like this:

import numpy as np


data = np.genfromtxt('filename.txt',deletechars='?',filling_values=np.NaN)

and if you need it in a pandas data frame, that is usually a pretty smooth transition

Reference:

  1. numpy.genfromtxt - NumPy Manual

Post a Comment for "Pandas: How To Import "horse-colic.data", A Space Separated Data File?"