Pyspark Add New Column Field With The Data Frame Row Number
Hy, I'm trying build a recommendation system with Spark I have a data frame with users email and movie rating. df = pd.DataFrame(np.array([['',2,3],['',5,5]
Solution 1:
Primary keys with Apache Spark practically answers your question but in this particular case using StringIndexer
could be a better choice:
from import StringIndexer
indexer = StringIndexer(inputCol="user", outputCol="user_id")
indexed = ).transform(sparkdf)
Post a Comment for "Pyspark Add New Column Field With The Data Frame Row Number"