I made a script that combines data from 2 different csv files and generates a txt file with different lines (prompt). What I want to do is to avoid a repetition of the same "fintag" variable in a way that all the prompts would be different.
This script does exactly what I need, but it obviously repeats some of the values because ran is a random number.
I can't avoid repetitions of the same random number, because the random number is used in multiple column. Creating a different variable for each column would solve it, but the columns number is high, and it might even change overtime.
The alternative is to remove the elements from the "asstag" lists once they've been used, but the list is generated within a for loop and I have no idea how to remove elements from a list while a for loop is iterating on it.
Input:
people = {'Name' : ['mark', 'bill', 'tim', 'frank'],
'Tag' : [color, animal, clothes, animal]}
dic = {'color' : ['blu', 'green', 'red', 'yellow'],
'animal' : [dog, cat, horse, shark],
'clothes' : [gloves, shoes, shirt, socks]}
Expected Output:
mark blu (or green, or red, or yellow)
bill horse (or dog, or cat, or shark)
tim socks (or gloves, or shoes, or shirt)
frank dog (or cat, or shark, but not horse if horse is already assigned to bill)
Code:
people = pd.read_csv("people.csv")
dic = pd.read_csv("dic.csv")
nam = list(people.loc[:,"Name"])
tag = list(people.loc[:,"Tag"])
with open("test.txt", "w ") as file:
for n, t in zip (nam, tag):
asstag = list(dic.loc[:, t])
ran = random.randint(0, len(dic.loc[:, tag]) - 1)
fintag = asstag[ran]
prompt = (str(nam) " " str(fintag))
print(prompt)
file.write(prompt)
CodePudding user response:
One approach to select by tag unique elements, using random.sample
:
import pandas as pd
import random
from collections import Counter
random.seed(42)
people = pd.DataFrame({'Name': ['mark', 'bill', 'tim', 'frank'],
'Tag': ['color', 'animal', 'clothes', 'animal']})
dic = pd.DataFrame({'color': ['blu', 'green', 'red', 'yellow'],
'animal': ['dog', 'cat', 'horse', 'shark'],
'clothes': ['gloves', 'shoes', 'shirt', 'socks']})
names = list(people.loc[:, "Name"])
tags = list(people.loc[:, "Tag"])
samples_by_tag = {tag: random.sample(dic.loc[:, tag].unique().tolist(), count) for tag, count in Counter(tags).items()}
for name, tag in zip(names, tags):
print(name, samples_by_tag[tag].pop())
Output
mark blu
bill horse
tim shirt
frank dog
The idea is to sample n_i
unique elements by each tag using random.sample
, where n_i
is the number each tag
appears in tags
, this is done in the line:
samples_by_tag = {tag: random.sample(dic.loc[:, tag].unique().tolist(), count) for tag, count in Counter(tags).items()}
for a given run it can take the following value:
{'color': ['blu'], 'animal': ['dog', 'horse'], 'clothes': ['shirt']}
# samples_by_tag
Note that you need to remove:
random.seed(42)
to make the script give random results every time. See the documentation on random.seed
and the notes on reproducibility.