Home > Software design >  Replacing columns in pandas
Replacing columns in pandas

Time:03-04

I have a dataframe which has 80 columns. I want to replace some random columns with different values. I found some solution where we use df["c"] = mylist. but what if i want to randomly select a column and i don't know the column name. Something like, colNum = 12, and then i do df[colNum] = mylist. Here is the code i tried but with no luck:

def poisonData(data):
    newValues = []
    for i in range(6):
        colNum = np.random.randint(0,81)
        temp=data.iloc[:,colNum]
        for x in temp:
            newValues.append(float(x*colNum))
        se = pd.Series(newValues)
        data.columns[colNum] = se.values
    return data

i also tried data.iloc[:,colNum] = se.values. I can't find what i am doing wrong :(

CodePudding user response:

Use numpy.random.choice to randomly select N columns:

N = 3
cols = np.random.choice(df.columns, size=N, replace=False)

Then, to loop:

for col in columns: 
    df[col] # do something

or with a vectorial function:

df[cols] = df[cols].apply(something)

# OR

df[cols] = func(df[cols])

CodePudding user response:

You could use random.sample and then .iloc to select any column without duplicates:

>>> index_of_random_cols = random.sample(range(len(df.columns)), 6)
>>> df.iloc[:, index_of_random_cols]

Then, you could fill these columns using a numpy array of random values:

... = np.random.rand(len(df.index), 6)

Resulting code:

>>> N_cols = 6
>>> index_of_random_cols = random.sample(range(len(df.columns)), 6)
>>> df.iloc[:, index_of_random_cols] = index_of_random_cols * np.random.rand(len(df.index), N_cols)
  • Related