Skip to content Skip to sidebar Skip to footer

Python (pyspark) Error = Valueerror: Could Not Convert String To Float: "17"

I am working with Python on Spark and reading my dataset from a .csv file whose first a few rows are: 17 0.2 7 17 0.2 7 39 1.3 7 19 1 7 19 0 7 When I read from the f

Solution 1:

Following the comments below this answer, you should use:

[float(x.strip(' "')) for x in line.split(',')]

You do not need to replace ',' with ' ', you should simply split on , and then remove leading and trailing whitespaces and quotes (x.strip(' "')) before converting to float.

Also, have a look at the csv packages which may simplify your work.

Below is the answer to the original question (before comments).

You need to use .split() instead of .split(' '). You have multiple consecutive space characters in your line, so splitting on ' ' results in empty strings, e.g. your first line is split into:

['17', '', '0.2', '', '7']

The problem are those empty strings that you (obviously) cannot convert to float.

Using split() will solve the problem thanks to the behaviour of split when its sep argument is None (or not present):

If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed).

See the doc of split, and a small example to understand the difference:

>>> sp5 = ' ' * 5>>> sp5.split()
>>> sp5.split(' ')
['', '', '', '', '', '']

Post a Comment for "Python (pyspark) Error = Valueerror: Could Not Convert String To Float: "17""