Skip to content Skip to sidebar Skip to footer

Randomizing/shuffling Rows In A Dataframe In Pandas

I am currently trying to find a way to randomize items in a dataframe row-wise. I found this thread on shuffling/permutation column-wise in pandas (shuffling/permutating a DataFram

Solution 1:

Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)

I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :

In [1]: A
Out[1]: 
array([[11, 'Blue', 'Mon'],
       [8, 'Red', 'Tues'],
       [10, 'Green', 'Wed'],
       [15, 'Yellow', 'Thurs'],
       [11, 'Black', 'Fri']], dtype=object)

In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so returnNoneIn [3]: A
Out[3]: 
array([['Mon', 11, 'Blue'],
       [8, 'Tues', 'Red'],
       ['Wed', 10, 'Green'],
       ['Thurs', 15, 'Yellow'],
       [11, 'Black', 'Fri']], dtype=object)

And if you want to keep dataframe :

In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]: 
  Number  color     day
0    Mon     11    Blue
18   Tues     Red
2    Wed     10   Green
3  Thurs     15  Yellow
411  Black     Fri

Here a function to shuffle rows and columns:

import numpy as np
import pandas as pd

def shuffle(df):
    col = df.columns
    val = df.values
    shape = val.shape
    val_flat = val.flatten()
    np.random.shuffle(val_flat)
    return pd.DataFrame(val_flat.reshape(shape),columns=col)

In [2]: data
Out[2]: 
   Number   color    day
011    Blue    Mon
18     Red   Tues
210   Green    Wed
315  Yellow  Thurs
411   Black    Fri

In [3]: shuffle(data)
Out[3]: 
  Number  color     day
0    Fri    Wed  Yellow
1  Thurs  Black     Red
2  Green   Blue      113118104    Mon   Tues      15

Hope this helps

Solution 2:

Maybe flatten the 2d array and then shuffle?

In [21]: data2=dataframe.values.flatten()

In [22]: np.random.shuffle(data2)

In [23]: dataframe2=pd.DataFrame (data2.reshape(dataframe.shape), columns=dataframe.columns )

In [24]: dataframe2
Out[24]: 
  Number   color    day
0   Tues  Yellow     111    Red   Green    Wed
2  Thurs     Mon   Blue
3158  Black
4    Fri      1110

Solution 3:

Building on @jrjc 's answer, I have posted https://stackoverflow.com/a/44686455/5009287 which uses np.apply_along_axis()

a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
 [20 21 22]
 [30 31 32]
 [40 41 42]]print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
 [22 21 20]
 [31 30 32]
 [40 41 42]]

See the full answer to see how that could be integrated with a Pandas df.

Post a Comment for "Randomizing/shuffling Rows In A Dataframe In Pandas"