Skip to content Skip to sidebar Skip to footer

Customize Large Datasets Comparison In Pyspark

I'm using the code below to compare two dataframe and identified differences. However, I'm noticing that I'm simply overwriting my values ( combine_df). My goal is to Flag if row

Solution 1:

Have you used correct df

#instead of this
filter_module = expected_df.select(list(cols))
filter_expected = expected_df.select(list(cols))
#use this
filter_module = module_df.select(list(cols))
filter_expected = expected_df.select(list(cols))

Post a Comment for "Customize Large Datasets Comparison In Pyspark"