Skip to content Skip to sidebar Skip to footer

Number Of Lines With Number Of Words Less Than 5

Using pyspark, I would like to find number of lines that has number of words < 5 I wrote this code but I couldn't figure out what is wrong with it from pyspark.sql import SparkS

Solution 1:

I think you have some parentheses in the wrong place in this expression:

rdd1=rdd.filter(lambda line: len((line.split(" "))<5)).collect()

The way you have it, you're doing this:

len(... < 5)

Instead of this:

len(...) < 5

Solution 2:

I solved it. The problem was that I was trying to split a list. This is the new line

rdd=rdd.filter(lambda line: len(line[0].split(" "))<5).collect()

Post a Comment for "Number Of Lines With Number Of Words Less Than 5"