Home > OS >  Assign column names into rows on the condition of value presence in Pandas
Assign column names into rows on the condition of value presence in Pandas

Time:09-24

I am trying to assign the column name on the condition that the value is not zero. How do I do this in Pandas?

import pandas as pd

df = pd.DataFrame({
    'date' : ['2021-08-01', '2021-08-01', '2021-08-01', '2021-08-02', '2021-08-02', '2021-08-02'],
    'person': ['type_A', 'type_C', 'type_C', 'type_A', 'type_B', 'type_D'],
    'London' : [1, 4, 0, 5, 7, 9],
    'New York' : [0.0, 0.0, 3, 0.0, 0.0, 0.0],
    'Boston' : [3, 6, 9, 1, 0.0, 7],
    'Hong Kong' : [0.0, 0.0, 0.0, 0.0, 3, 0.0]
})

         date  person  London  New York  Boston  Hong Kong
0  2021-08-01  type_A       1       0.0     3.0        0.0
1  2021-08-01  type_C       4       0.0     6.0        0.0
2  2021-08-01  type_C       0       3.0     9.0        0.0
3  2021-08-02  type_A       5       0.0     1.0        0.0
4  2021-08-02  type_B       7       0.0     0.0        3.0
5  2021-08-02  type_D       9       0.0     7.0        0.0


Expected output:

         date  person  London  New York  Boston  Hong Kong                   Col_name
0  2021-08-01  type_A       1       0.0     3.0        0.0             London, Boston
1  2021-08-01  type_C       4       0.0     6.0        0.0             London, Boston
2  2021-08-01  type_C       0       3.0     9.0        0.0           New York, Boston
3  2021-08-02  type_A       5       0.0     1.0        0.0             London, Boston
4  2021-08-02  type_B       7       0.0     0.0        3.0          London, Hong Kong
5  2021-08-02  type_D       9       0.0     7.0        0.0  London, Boston, Hong Kong

CodePudding user response:

Try:

country_cols = ['London', 'New York', 'Boston', 'Hong Kong']
df['Col_name'] = df[country_cols].apply(lambda x: ', '.join(x[x!=0].index.to_list()), axis=1)

Output:

         date  person  London  New York  Boston  Hong Kong           Col_name
0  2021-08-01  type_A       1       0.0     3.0        0.0     London, Boston
1  2021-08-01  type_C       4       0.0     6.0        0.0     London, Boston
2  2021-08-01  type_C       0       3.0     9.0        0.0   New York, Boston
3  2021-08-02  type_A       5       0.0     1.0        0.0     London, Boston
4  2021-08-02  type_B       7       0.0     0.0        3.0  London, Hong Kong
5  2021-08-02  type_D       9       0.0     7.0        0.0     London, Boston

CodePudding user response:

Adapting this beautiful answer you could avoid apply:

df["not_zeros"] = (
    df[df.columns[2:]].ne(0).astype("int")).dot(
    df.columns[2:]   ', ').str.rstrip(', ')
         date  person  London  New York  Boston  Hong Kong          not_zeros
0  2021-08-01  type_A       1       0.0     3.0        0.0     London, Boston
1  2021-08-01  type_C       4       0.0     6.0        0.0     London, Boston
2  2021-08-01  type_C       0       3.0     9.0        0.0   New York, Boston
3  2021-08-02  type_A       5       0.0     1.0        0.0     London, Boston
4  2021-08-02  type_B       7       0.0     0.0        3.0  London, Hong Kong
5  2021-08-02  type_D       9       0.0     7.0        0.0     London, Boston

CodePudding user response:

try following code. It is just more simplified

country_names = ['London', 'New York', 'Boston', 'Hong Kong']
# main list to save country names for each row of dataframe
col_name = []
for index, row in df.iterrows():
    # temporary list to save country name for a row
    temp = []
    for country_name in country_names:
        if float(row[country_name]) != 0.0:
            temp.append(country_name)
    # appending coutnry names to col_name list as a string
    col_name.append(" ".join([i for i in temp]))
    
df["col_name"] = col_name
  • Related