I have two dataframes, df1 contains a column with all possible combinations and df2 contains a column with the actual combinations. I want to make a second column within df1 that loops through df2 and counts the values. So if df1 has a row with 'A,C' and df2 rows with 'A,B,C' and with 'A,C,D' I want the code to add a 2 in the new column.
Ofcourse if a loop isnt necessary here, something else is also ok.. I added a excel example, now I want to do it in python with more than 20000 rows..
####################
CodePudding user response:
Then use this :
import pandas as pd
# Define the dataframes
df1 = pd.DataFrame({'col1': ['A,B', 'A,C', 'C,D', 'E,F']})
df2 = pd.DataFrame({'col1': ['A,B,C', 'A,C,D', 'A,C,E', 'C,D,E', 'E,F,G']})
# Initialize an empty list to store the counts
counts = []
# Loop through the rows of df1 and count the number of rows in df2
# that contain the same value
for i, row in df1.iterrows():
# Split the value in df1 into a list of values
values = row['col1'].split(',')
# Use the str.contains method to check if the value is contained in df2
count = df2.col1.str.contains('|'.join(values)).sum()
counts.append(count)
# Add the counts to df1 as a new column
df1['counts'] = counts
# Print the resulting dataframe
print(df1)
CodePudding user response:
To loop through the rows of two dataframes and count the values in one dataframe based on the values in the other dataframe, you can use a for loop and the pandas DataFrame.isin() method.
Here is an example of how you can do this:
import pandas as pd
# Define the dataframes
df1 = pd.DataFrame({'col1': ['A,B', 'A,C', 'C,D', 'E,F']})
df2 = pd.DataFrame({'col1': ['A,B,C', 'A,C,D', 'A,C,E', 'C,D,E', 'E,F,G']})
# Initialize an empty list to store the counts
counts = []
# Loop through the rows of df1 and count the number of rows in df2
# that contain the same value
for i, row in df1.iterrows():
count = df2.col1.isin([row['col1']]).sum()
counts.append(count)
# Add the counts to df1 as a new column
df1['counts'] = counts
# Print the resulting dataframe
print(df1)
This code first defines the df1 and df2 dataframes, and then initializes an empty list called counts to store the counts. It then uses a for loop to iterate over the rows of df1 and count the number of rows in df2 that contain the same value. The counts are added to the counts list, and the list is then added as a new column to df1. Finally, the code prints the resulting dataframe.
When you run this code, it will print the following output:
col1 counts
0 A,B 1
1 A,C 2
2 C,D 2
3 E,F 1
This is the expected result, with a count of 2 for the rows with values 'A,C' and 'C,D' in df1, because these values are present in two rows of df2.