Customize Large Datasets Comparison In Pyspark
I'm using the code below to compare two dataframe and identified differences. However, I'm noticing that I'm simply overwriting my values ( combine_df). My goal is to Flag if row
Solution 1:
Have you used correct df
#instead of this
filter_module = expected_df.select(list(cols))
filter_expected = expected_df.select(list(cols))
#use this
filter_module = module_df.select(list(cols))
filter_expected = expected_df.select(list(cols))
Post a Comment for "Customize Large Datasets Comparison In Pyspark"