How To Compare Two Dataframes Ignoring Column Names?
Solution 1:
pd.DataFrame
is built around pd.Series
, so it's unlikely you will be able to perform comparisons without column names.
But the most efficient way would be to drop down to numpy
:
assert_equal = (df.values == df_equal.values).all()
To deal with np.nan
, you can use np.testing.assert_equal
and catch AssertionError
, as suggested by @Avaris :
import numpy as np
defnan_equal(a,b):
try:
np.testing.assert_equal(a,b)
except AssertionError:
returnFalsereturnTrue
assert_equal = nan_equal(df.values, df_equal.values)
Solution 2:
I just needed to get the values (numpy array) from the data frame, so the column names won't be considered.
df.eq(df_equal.values).all().all()
I would still like to see a parameter on equals
, or assert_frame_equal
. Maybe I am missing something.
An advantage of this compared to @jpp answer is that, I can get see which columns do not match, calling only all()
only once:
df.eq(df_diff.values).all()
Out[24]:
A True
B False
dtype: bool
One problem is that when eq is used, then np.nan
is not equal to np.nan
, in which case the following expression, would serve well:
(df.eq(df_equal.values) | (df.isnull().values & df_equal.isnull().values)).all().all()
Solution 3:
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
for i in range(df1.shape[0]):
for j in range(df1.shape[1]):
print(df1.iloc[i, j] == df2.iloc[i, j])
Will return:
TrueTrueTrueTrue
Same thing for:
df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
One obvious issue is that column names matters in Pandas to sort dataframes. For example:
df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df2 = pd.DataFrame({'a': [1, 2], 'B': [3, 4]})
print(df1)
print(df2)
renders as ('B' is before 'a' in df2):
ab013124Ba031142
Post a Comment for "How To Compare Two Dataframes Ignoring Column Names?"