Home > front end >  combining multiple columns into one by checking for NaN
combining multiple columns into one by checking for NaN

Time:12-04

I'm struggling on the following case. I have a dataframe with columns containing NaN or a string value. How could I merge all 3 columns (i.e. Q3_4_4, Q3_4_5, and Q3_4_6 into one new column (i.e Q3_4) by only keeping the string value? This column would have 'hi' in row 1, 'bye' in row 2, and 'hello' in row3. Thank you for your help

{'Q3_4_4': {'R_00RfS8OP6QrTNtL': nan,
  'R_3JtmbtdPjxXZAwA': nan,
  'R_3G2sp6TEXZmf2KI': 'hello'
  },
 'Q3_4_5': {'R_00RfS8OP6QrTNtL': 'hi',
  'R_3JtmbtdPjxXZAwA': nan,
  'R_3G2sp6TEXZmf2KI': nan},
 'Q3_4_6': {'R_00RfS8OP6QrTNtL': nan,
  'R_3JtmbtdPjxXZAwA': 'bye',
  'R_3G2sp6TEXZmf2KI': nan},
 }

CodePudding user response:

If need join by columns names with removed last value after last _ in columns names use GroupBy.first per axis=1 (per columns) with lambda for select by columns names spiller from right by first _ and selecting:

nan = np.nan

df = pd.DataFrame({'Q3_4_4': {'R_00RfS8OP6QrTNtL': nan,
  'R_3JtmbtdPjxXZAwA': nan,
  'R_3G2sp6TEXZmf2KI': 'hello'
  },
 'Q3_4_5': {'R_00RfS8OP6QrTNtL': 'hi',
  'R_3JtmbtdPjxXZAwA': nan,
  'R_3G2sp6TEXZmf2KI': nan},
 'Q3_4_6': {'R_00RfS8OP6QrTNtL': nan,
  'R_3JtmbtdPjxXZAwA': 'bye',
  'R_3G2sp6TEXZmf2KI': nan},
 })

df = df.groupby(lambda x: x.rsplit('_', 1)[0], axis=1).first()
print (df)
                    Q3_4
R_00RfS8OP6QrTNtL     hi
R_3JtmbtdPjxXZAwA    bye
R_3G2sp6TEXZmf2KI  hello

CodePudding user response:

df.apply(lambda x : x[x.last_valid_index()], 1)

More methods: https://stackoverflow.com/a/46520070/8170215

  • Related