I'm trying a fill a dataframe column with random string characters - but using the code below I get a new 10 character string each time I run it - but it's the same for every row.
How do I generate a new string for each row?
print(df)
0 eqFSwEJQqD
1 eqFSwEJQqD
2 eqFSwEJQqD
3 eqFSwEJQqD
4 eqFSwEJQqD
...
1019920 eqFSwEJQqD
1019921 eqFSwEJQqD
1019922 eqFSwEJQqD
1019923 eqFSwEJQqD
1019924 eqFSwEJQqD
I'd like, eg:
0 fGtryghjuYt
1 jUiKlOpYtrd
etc...
The code:
import random
from string import ascii_letters
df['ff'] = ''.join(random.choice(ascii_letters) for x in range(10))
CodePudding user response:
Use:
df['ff'] = [''.join(random.choice(ascii_letters) for x in range(10)) for _ in range(len(df))]
print(df)
Like in the example below:
import random
from string import ascii_letters
import pandas as pd
df = pd.DataFrame(data=list(range(10)), columns=["id"])
df['ff'] = [''.join(random.choice(ascii_letters) for x in range(10)) for _ in range(len(df))]
print(df)
Output
id ff
0 0 UKCYsUXRYi
1 1 vvYweriLfb
2 2 eYcCCnXfhW
3 3 xiyPisioWt
4 4 cjMOxAcULS
5 5 lgkxtFCbBx
6 6 pPeEOmfgkB
7 7 EBhBfticnM
8 8 hdQxePBmCq
9 9 KCPosrHfgz
The problem with your approach is that it creates a unique single value and uses it to assign it to the whole 'ff'
column
CodePudding user response:
Use:
df['ff'] = df['col'].apply(''.join(random.choice(ascii_letters) for x in range(10)))
Or:
df['ff'] = [''.join(random.choice(ascii_letters) for x in range(10)) for _ in df.index]
Or:
from string import ascii_letters
a = np.random.choice(list(ascii_letters), size=(10, len(df)))
df['ff'] = np.apply_along_axis(''.join, 0, a)