Customize Large Datasets Comparison In Pyspark

October 08, 2023 Post a Comment

I'm using the code below to compare two dataframe and identified differences. However, I'm noticing that I'm simply overwriting my values ( combine_df). My goal is to Flag if row

Solution 1:

Have you used correct df

#instead of this
filter_module = expected_df.select(list(cols))
filter_expected = expected_df.select(list(cols))
#use this
filter_module = module_df.select(list(cols))
filter_expected = expected_df.select(list(cols))

Baca Juga