Skip to content Skip to sidebar Skip to footer

Most Efficient Way To Un-dummy Variables In Pandas Df

So in the screenshot below, we have 3 different energy sites, ID01, ID18, and ID31. They're in a dummy variable type of format, and for visualization purposes I want to just create

Solution 1:

Setup

data = pd.DataFrame([
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 0, 0],
    [0, 1, 0]
], columns=['ID01', 'ID18', 'ID31']).assign(A=1, B=2)

data

   ID01  ID18  ID31  A  B
010012101012200112310012401012

dot product with strings and objects.

This works if these are truly dummy values 0 or 1

def undummy(d):
    return d.dot(d.columns)

data.assign(Site=data.filter(regex='^ID').pipe(undummy))

   ID01  ID18  ID31  A  B  Site
010012  ID01
101012  ID18
200112  ID31
310012  ID01
401012  ID18

argmax slicing

This works but can produce unexpected results if data is not as represented in question.

def undummy(d):
    return d.columns[d.values.argmax(1)]

data.assign(Site=data.filter(regex='^ID').pipe(undummy))

   ID01  ID18  ID31  A  B  Site
010012  ID01
101012  ID18
200112  ID31
310012  ID01
401012  ID18

Post a Comment for "Most Efficient Way To Un-dummy Variables In Pandas Df"