Skip to content Skip to sidebar Skip to footer

Pyspark Sql Compare Records On Each Day And Report The Differences

so the problem I have is I have this dataset: and it shows the businesses are doing business in the specific days. what i want to achieve is to report which businesses are added o

Solution 1:

Below is the dataframe operation for your question you might need to tweak a little bit as I dont have the sample data for it, written the code by seeing your data, please let me know if that solves your problem:

import pyspark.sql.functions as F
from pyspark.sql import Window

some_win = Window.partitionBy("securityDesc").orderBy(F.col("[date]").asc())
some_table.withColumn(
    "buisness_added_day",
    F.first(F.col("id")).over(some_win)
).select(
    "buisness_added_day",
    "securityDesc",
    "TradedVolumSum",
    "Mnemonic"
).distinct().orderBy("buisness_added_day").show()

Post a Comment for "Pyspark Sql Compare Records On Each Day And Report The Differences"