Number Of Lines With Number Of Words Less Than 5
Using pyspark, I would like to find number of lines that has number of words < 5 I wrote this code but I couldn't figure out what is wrong with it from pyspark.sql import SparkS
Solution 1:
I think you have some parentheses in the wrong place in this expression:
rdd1=rdd.filter(lambda line: len((line.split(" "))<5)).collect()
The way you have it, you're doing this:
len(... < 5)
Instead of this:
len(...) < 5
Solution 2:
I solved it. The problem was that I was trying to split a list. This is the new line
rdd=rdd.filter(lambda line: len(line[0].split(" "))<5).collect()
Post a Comment for "Number Of Lines With Number Of Words Less Than 5"