I'm an amateur user having some experiences VBA but trying to switch to Python because my beautiful new MBP runs VBA miserably. I'm trying to create a df column, based on whether another column value has occur-ed already. If it has, then the new column value is 0 on that row, if not 1.
For example: I want to create column C in the example below. How do I do it quickly?
A B C (to create column C)
0 001 USA 1
1 002 Canada 1
3 003 China 1
4 004 India 1
5 005 UK 1
6 006 Japan 1
7 007 USA 0
8 008 UK 0
CodePudding user response:
After creating your dataframe :
import pandas as pandas
data = [["001", "USA"], ["002", "Canada"], ["003", "China"],
["004", "India"], ["005", "UK"], ["006", "Japan"], ["007", "USA"], ["008", "UK"]]
# Create a dataframe
df = pandas.DataFrame(data, columns=["A", "B"])
You can apply a function to each value of one of the columns (in your case, the B
column) and have the output of the function as the value of your column.
df["C"] = df.B.apply(lambda x: 1 if df.B.value_counts()[x] == 1 else 0)
This will check if the value in the B column appears somewhere else in the column, and will return 1
if unique and 0
if duplicated.
The dataframe looks like this :
A B C
0 001 USA 0
1 002 Canada 1
2 003 China 1
3 004 India 1
4 005 UK 0
5 006 Japan 1
6 007 USA 0
7 008 UK 0
If you want the values to be recalculated each time you need to have the command
df["C"] = df.B.apply(lambda x: 1 if df.B.value_counts()[x] == 1 else 0)
executed each time after you add a row.
CodePudding user response:
You can check for duplicates on the 'B'
column and set duplicates to 0. Then set any non-duplicates to 1 like this:
df = pd.DataFrame({'A':[1, 2, 3, 4, 5, 6, 7, 8], 'B':['USA', 'Canada', 'China', 'India', 'UK', 'Jpan', 'USA', 'UK']})
df.loc[df['B'].duplicated(), 'C'] = 0
df['C'] = df['C'].fillna(1).astype(int)
print(df)
Output:
A B C
0 1 USA 1
1 2 Canada 1
2 3 China 1
3 4 India 1
4 5 UK 1
5 6 Jpan 1
6 7 USA 0
7 8 UK 0