Home > Mobile >  Create new column in a pandas DF, iterate per row and based the column on column headers
Create new column in a pandas DF, iterate per row and based the column on column headers

Time:10-05

Might sounds like a duplicate question, but it is slightly different than what I have found here. I have the following pandas DF (without the new column), and I want to create the new_column

| chicken | cow | moose | goat | pigeon | new column                       |
|:--------|:----|:------|:-----|:-------|:---------------------------------|
| 0       | 0   | 0     | 0.2  | 0      | [goat]                           |
| 0.4     | 0.2 | 0     | 0    | 0      | [chicken, cow]                   |
| 0.2     | 0   | -0.2  | 0.1  | 0.3    | [chicken, moose, goat, pigeon]   |
| 0       | 0   | 0     | 0    | 0      | [ ]                              |

I was thinking to make an empty column first, and then append the column value in the new cell when != 0. However, haven't been able to find the solution though. I feel it shouldn't be that hard.. any help from the community?

CodePudding user response:

You can compare valeus for not equal to 0 and create joined values by separator added to columns names, last ncessary split for lists:

df['new column'] = df.ne(0).dot(df.columns   ',').str[:-1].str.split(',')

Or if performance is important use list comprehension:

cols = df.columns.to_numpy()
df['new column'] = [list(cols[x]) for x in df.ne(0).to_numpy()]

Or if performance is NOT important use apply per axis=1:

df['new column'] = df.ne(0).apply(lambda x: list(x.index[x]), axis=1)

print (df)
   chicken  cow  moose  goat  pigeon                      new column
0      0.0  0.0    0.0   0.2     0.0                          [goat]
1      0.4  0.2    0.0   0.0     0.0                  [chicken, cow]
2      0.2  0.0   -0.2   0.1     0.3  [chicken, moose, goat, pigeon]
3      0.0  0.0    0.0   0.0     0.0                              []

CodePudding user response:

I would use this approach.

Define a function that you will apply to every row and filter the pandas series and get the index name.

import pandas as pd

d = {
    "chicken":[0, 0.4, 0.2, 0],
    "cow":[0, 0.2, 0, 0],
    "moose":[0, 0, -0.2, 0],
    "goat":[0.2, 0, 0.1, 0],
    "pigeon":[0, 0, 0.3, 0]
}

df = pd.DataFrame(data = d)

def get_col_names(row):
    return row[row > 0].index.tolist()

df["new_column"] = df.apply(lambda row: get_col_names(row), axis = 1)

df

enter image description here

  • Related