I am a beginner in python, and I have a question that perhaps is simple. I have a "file.txt", where in principle there can be a number n of strings.
> file.txt
John
Rafa
Marta
...
n
This is loaded into the program with:
with open('/media/names.txt') as f:
lines = f.read().splitlines()
Now, I load a dataframe from a csv, which has a column (with name "Identifier") that contains a lot of names.
Registration = pd.read_csv('/media/Registration.csv',
sep='\t', header=0)
The goal is to find the n strings separately for each variable. For example, in this case I have done it for the first data in the list:
names_1 = Registration[Registration['Identifier'].str.contains(lines[1])]
print(names_1)
Only keeping the lines that have "John" as an identifier. However, I am trying to create n dataframes as there are items in the "file.txt" list.
names_1 = Registration[Registration['Identifier'].str.contains(lines[1])]
names_2 = Registration[Registration['Identifier'].str.contains(lines[2])]
names_3 = Registration[Registration['Identifier'].str.contains(lines[3])]
names_n = Registration[Registration['Identifier'].str.contains(lines[n])]
But I'm a bit stuck and I don't know how to do this loop. Someone help me? Thanks!
CodePudding user response:
Theoretically speaking, the answer to your question is that local variables are stored in a dictionary accessible with the function locals(). As a result, it is possible to generate variables in a loop exactly as asked.
for i, line in enumerate(lines):
locals()[f'names_{i}'] = Registration[Registration['Identifier'].str.contains(line)]
However, just because you can do it doesn't mean you should, it's generally not a good idea to generate variables in this manner.
Just ask yourself, how would you access the nth variable? You are going down a path that will make your data difficult to work with. A better approach is to use a data structure like a dictionary or a list to easily keep track of it.
names = []
for line in lines:
names.append(Registration[Registration['Identifier'].str.contains(line)])
Do note also that the first index is 0, not 1.
CodePudding user response:
Python list indexes begin by 0.
Try with a for-loop like this:
for i in range(len(lines)):
names = Registration[Registration['Identifier'].str.contains(lines[i])]
But then you'll need to keep value of names
. Maybe in a list:
name_list = []
for i in range(len(lines)):
names = Registration[Registration['Identifier'].str.contains(lines[i])]
name_list.append(names)
print(name_list)
Try this! Enjoy coding!