I am trying to assign the column name on the condition that the value is not zero. How do I do this in Pandas?
import pandas as pd
df = pd.DataFrame({
'date' : ['2021-08-01', '2021-08-01', '2021-08-01', '2021-08-02', '2021-08-02', '2021-08-02'],
'person': ['type_A', 'type_C', 'type_C', 'type_A', 'type_B', 'type_D'],
'London' : [1, 4, 0, 5, 7, 9],
'New York' : [0.0, 0.0, 3, 0.0, 0.0, 0.0],
'Boston' : [3, 6, 9, 1, 0.0, 7],
'Hong Kong' : [0.0, 0.0, 0.0, 0.0, 3, 0.0]
})
date person London New York Boston Hong Kong
0 2021-08-01 type_A 1 0.0 3.0 0.0
1 2021-08-01 type_C 4 0.0 6.0 0.0
2 2021-08-01 type_C 0 3.0 9.0 0.0
3 2021-08-02 type_A 5 0.0 1.0 0.0
4 2021-08-02 type_B 7 0.0 0.0 3.0
5 2021-08-02 type_D 9 0.0 7.0 0.0
Expected output:
date person London New York Boston Hong Kong Col_name
0 2021-08-01 type_A 1 0.0 3.0 0.0 London, Boston
1 2021-08-01 type_C 4 0.0 6.0 0.0 London, Boston
2 2021-08-01 type_C 0 3.0 9.0 0.0 New York, Boston
3 2021-08-02 type_A 5 0.0 1.0 0.0 London, Boston
4 2021-08-02 type_B 7 0.0 0.0 3.0 London, Hong Kong
5 2021-08-02 type_D 9 0.0 7.0 0.0 London, Boston, Hong Kong
CodePudding user response:
Try:
country_cols = ['London', 'New York', 'Boston', 'Hong Kong']
df['Col_name'] = df[country_cols].apply(lambda x: ', '.join(x[x!=0].index.to_list()), axis=1)
Output:
date person London New York Boston Hong Kong Col_name
0 2021-08-01 type_A 1 0.0 3.0 0.0 London, Boston
1 2021-08-01 type_C 4 0.0 6.0 0.0 London, Boston
2 2021-08-01 type_C 0 3.0 9.0 0.0 New York, Boston
3 2021-08-02 type_A 5 0.0 1.0 0.0 London, Boston
4 2021-08-02 type_B 7 0.0 0.0 3.0 London, Hong Kong
5 2021-08-02 type_D 9 0.0 7.0 0.0 London, Boston
CodePudding user response:
Adapting this beautiful answer
you could avoid apply
:
df["not_zeros"] = (
df[df.columns[2:]].ne(0).astype("int")).dot(
df.columns[2:] ', ').str.rstrip(', ')
date person London New York Boston Hong Kong not_zeros
0 2021-08-01 type_A 1 0.0 3.0 0.0 London, Boston
1 2021-08-01 type_C 4 0.0 6.0 0.0 London, Boston
2 2021-08-01 type_C 0 3.0 9.0 0.0 New York, Boston
3 2021-08-02 type_A 5 0.0 1.0 0.0 London, Boston
4 2021-08-02 type_B 7 0.0 0.0 3.0 London, Hong Kong
5 2021-08-02 type_D 9 0.0 7.0 0.0 London, Boston
CodePudding user response:
try following code. It is just more simplified
country_names = ['London', 'New York', 'Boston', 'Hong Kong']
# main list to save country names for each row of dataframe
col_name = []
for index, row in df.iterrows():
# temporary list to save country name for a row
temp = []
for country_name in country_names:
if float(row[country_name]) != 0.0:
temp.append(country_name)
# appending coutnry names to col_name list as a string
col_name.append(" ".join([i for i in temp]))
df["col_name"] = col_name