Skip to content Skip to sidebar Skip to footer

Ignoring Non-numerical String Values In Pandas Dataframe

I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings ('345') or some other string ('text'). Is there a way to drop all rows

Solution 1:

Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric converts mixed columns like yours, but converts non-numeric strings to NaN. This means you'll get float columns, not integer, since only float columns can have NaN values. That usually doesn't matter too much but it's good to be aware of.

df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})

pd.to_numeric(df['mixed_types'], errors='coerce')
Out[7]: 
0    12331.0
1      345.0
2        NaN
Name: mixed_types, dtype: float64

If you want to then drop all the NaN rows:

# Replace the column with the converted valuesdf['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')

# Drop NA values, listing the converted columns explicitly#   so NA values in other columns aren't dropped
df.dropna(subset = ['mixed_types'])
Out[11]: 
   mixed_types
0      12331.0
1        345.0

Solution 2:

You could use pd.to_numeric with errors=coerce to substitute your non numeric values with NaN and apply it the each column. Then you could use dropna or fillna whatever you prefer.

df = pd.read_csv('file.csv')
df = df.apply(pd.to_numeric, errors='coerce')
df = df.dropna()

Solution 3:

you can use df._get_numeric_data() directly.

Post a Comment for "Ignoring Non-numerical String Values In Pandas Dataframe"