I have a dataframe which has 80 columns. I want to replace some random columns with different values. I found some solution where we use df["c"] = mylist. but what if i want to randomly select a column and i don't know the column name. Something like, colNum = 12
, and then i do df[colNum] = mylist
. Here is the code i tried but with no luck:
def poisonData(data):
newValues = []
for i in range(6):
colNum = np.random.randint(0,81)
temp=data.iloc[:,colNum]
for x in temp:
newValues.append(float(x*colNum))
se = pd.Series(newValues)
data.columns[colNum] = se.values
return data
i also tried data.iloc[:,colNum] = se.values
. I can't find what i am doing wrong :(
CodePudding user response:
Use numpy.random.choice
to randomly select N
columns:
N = 3
cols = np.random.choice(df.columns, size=N, replace=False)
Then, to loop:
for col in columns:
df[col] # do something
or with a vectorial function:
df[cols] = df[cols].apply(something)
# OR
df[cols] = func(df[cols])
CodePudding user response:
You could use random.sample
and then .iloc
to select any column without duplicates:
>>> index_of_random_cols = random.sample(range(len(df.columns)), 6)
>>> df.iloc[:, index_of_random_cols]
Then, you could fill these columns using a numpy array of random values:
... = np.random.rand(len(df.index), 6)
Resulting code:
>>> N_cols = 6
>>> index_of_random_cols = random.sample(range(len(df.columns)), 6)
>>> df.iloc[:, index_of_random_cols] = index_of_random_cols * np.random.rand(len(df.index), N_cols)