Skip to content Skip to sidebar Skip to footer

How To Compare Two Dataframes Ignoring Column Names?

Suppose I want to compare the content of two dataframes, but not the column names (or index names). Is it possible to achieve this without renaming the columns? For example: df = p

Solution 1:

pd.DataFrame is built around pd.Series, so it's unlikely you will be able to perform comparisons without column names.

But the most efficient way would be to drop down to numpy:

assert_equal = (df.values == df_equal.values).all()

To deal with np.nan, you can use np.testing.assert_equal and catch AssertionError, as suggested by @Avaris :

import numpy as np

defnan_equal(a,b):
    try:
        np.testing.assert_equal(a,b)
    except AssertionError:
        returnFalsereturnTrue

assert_equal = nan_equal(df.values, df_equal.values)

Solution 2:

I just needed to get the values (numpy array) from the data frame, so the column names won't be considered.

df.eq(df_equal.values).all().all()

I would still like to see a parameter on equals, or assert_frame_equal. Maybe I am missing something.


An advantage of this compared to @jpp answer is that, I can get see which columns do not match, calling only all() only once:

df.eq(df_diff.values).all()
Out[24]: 
A     True
B    False
dtype: bool

One problem is that when eq is used, then np.nan is not equal to np.nan, in which case the following expression, would serve well:

(df.eq(df_equal.values) | (df.isnull().values & df_equal.isnull().values)).all().all()

Solution 3:

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

for i in range(df1.shape[0]):
    for j in range(df1.shape[1]):
        print(df1.iloc[i, j] == df2.iloc[i, j])

Will return:

TrueTrueTrueTrue

Same thing for:

df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

One obvious issue is that column names matters in Pandas to sort dataframes. For example:

df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df2 = pd.DataFrame({'a': [1, 2], 'B': [3, 4]})
print(df1)
print(df2)

renders as ('B' is before 'a' in df2):

ab013124Ba031142

Post a Comment for "How To Compare Two Dataframes Ignoring Column Names?"