Skip to content Skip to sidebar Skip to footer

Error In Labelled Point Object Pyspark

I am writing a function which takes a RDD as input splits the comma separated values then convert each row into labelled point object finally fetch the output as a dataframe code

Solution 1:

The reason you had no errors until you execute the action:

output.take(5)

Is due to the nature of spark, which is lazy. i.e. nothing was execute in spark until you execute the action "take(5)"

You have a few issues in your code, and I think that you are failing due to extra "[" and "]" in [line[1:]]

So you need to remove extra "[" and "]" in [line[1:]] (and keep only the line[1:])

Another issue which you might need to solve is the lack of dataframe schema.

i.e. replace "toDF()" with "toDF(["features","label"])" This will give the dataframe a schema.

Post a Comment for "Error In Labelled Point Object Pyspark"