Replace Nan In A Dataframe With Random Values

February 03, 2024 Post a Comment

I have a data frame (data_train) with NaN values, A sample is given below: republican n y republican n

Solution 1:

You can use the pandas update command, this way:

1) Generate a random DataFrame with the same columns and index as the original one:

import numpy as np; import pandas as pd
M = len(df.index)
N = len(df.columns)
ran = pd.DataFrame(np.random.randn(M,N), columns=df.columns, index=df.index)

2) Then use update, so that the NaN values in df will be replaced by the generated random values

df.update(ran)

In the above example I used values from a standard normal, but you can also use values randomly picked from the original DataFrame:

import numpy as np; import pandas as pd

M = len(df.index)
N = len(df.columns)

val = np.ravel(df.values)
val = val[~np.isnan(val)]
val = np.random.choice(val, size=(M,N))
ran = pd.DataFrame(val, columns=df.columns, index=df.index)

df.update(ran)

Solution 2:

Well, if you use fillna to fill the NaN, a random generator works only once and will fill all N/As with the same number.

So, make sure that a random number is generated and used each time. For a dataframe like this :

DateAB02015-01-01       NaNNaN12015-01-02       NaNNaN22015-01-03       NaNNaN32015-01-04       NaNNaN42015-01-05       NaNNaN52015-01-06       NaNNaN62015-01-07       NaNNaN72015-01-08       NaNNaN82015-01-09       NaNNaN92015-01-10       NaNNaN102015-01-11       NaNNaN112015-01-12       NaNNaN122015-01-13       NaNNaN132015-01-14       NaNNaN142015-01-15       NaNNaN152015-01-16       NaNNaN

I used the following code to fill up the NaNs in column A:

import random
x['A'] = x['A'].apply(lambda v: random.random() * 1000)

Which will give us something like:

DateAB02015-01-01   96.538211NaN12015-01-02  404.683392NaN22015-01-03  849.614253NaN32015-01-04  590.030660NaN42015-01-05  203.167519NaN52015-01-06  980.508258NaN62015-01-07  221.088002NaN72015-01-08  285.013762NaN

Solution 3:

If you want to replace all NaNs from the DF with random values from a list, you can do something like this:

import numpy as np

df.applymap(lambda l: l ifnot np.isnan(l) else np.random.choice([1, 3]))

Solution 4:

If you want to replace NaN in your column with hot deck technique, I can propose way like this :

defhot_deck(dataframe) :
    dataframe = dataframe.fillna(0)
    for col in dataframe.columns :
        assert (dataframe[col].dtype == np.float64) | (dataframe[col].dtype == np.int64)
        liste_sample = dataframe[dataframe[col] != 0][col].unique()
        dataframe[col] = dataframe.apply(lambda row : random.choice(liste_sample) if row[col] == 0else row[col],axis=1)
    return dataframe

After if you prefer just replace NaN with a new random value for each iteration you can do a thing like that. You've just to determine the max value of your random choices.

defhot_deck(dataframe,max_value) :
    dataframe = dataframe.fillna(0)
    for col in dataframe.columns :
        assert (dataframe[col].dtype == np.float64) | (dataframe[col].dtype == np.int64)
        liste_sample = random.sample(range(max_value),dataframe.isnull().sum())
        dataframe[col] = dataframe.apply(lambda row : random.choice(liste_sample) if row[col] == 0else row[col],axis=1)
    return dataframe

Solution 5:

Using fillna() inside loop and setting 'limit' attribute as 1 can help in replacing nan with different random values.

import random
while(Series.isnull().sum()!=0):
    Series.fillna(random.uniform(0,100),inplace=True,limit=1)

lacucinadiadine