I'm struggling on the following case. I have a dataframe with columns containing NaN or a string value. How could I merge all 3 columns (i.e. Q3_4_4, Q3_4_5, and Q3_4_6 into one new column (i.e Q3_4) by only keeping the string value? This column would have 'hi' in row 1, 'bye' in row 2, and 'hello' in row3. Thank you for your help
{'Q3_4_4': {'R_00RfS8OP6QrTNtL': nan,
'R_3JtmbtdPjxXZAwA': nan,
'R_3G2sp6TEXZmf2KI': 'hello'
},
'Q3_4_5': {'R_00RfS8OP6QrTNtL': 'hi',
'R_3JtmbtdPjxXZAwA': nan,
'R_3G2sp6TEXZmf2KI': nan},
'Q3_4_6': {'R_00RfS8OP6QrTNtL': nan,
'R_3JtmbtdPjxXZAwA': 'bye',
'R_3G2sp6TEXZmf2KI': nan},
}
CodePudding user response:
If need join by columns names with removed last value after last _
in columns names use GroupBy.first
per axis=1
(per columns) with lambda for select by columns names spiller from right by first _
and selecting:
nan = np.nan
df = pd.DataFrame({'Q3_4_4': {'R_00RfS8OP6QrTNtL': nan,
'R_3JtmbtdPjxXZAwA': nan,
'R_3G2sp6TEXZmf2KI': 'hello'
},
'Q3_4_5': {'R_00RfS8OP6QrTNtL': 'hi',
'R_3JtmbtdPjxXZAwA': nan,
'R_3G2sp6TEXZmf2KI': nan},
'Q3_4_6': {'R_00RfS8OP6QrTNtL': nan,
'R_3JtmbtdPjxXZAwA': 'bye',
'R_3G2sp6TEXZmf2KI': nan},
})
df = df.groupby(lambda x: x.rsplit('_', 1)[0], axis=1).first()
print (df)
Q3_4
R_00RfS8OP6QrTNtL hi
R_3JtmbtdPjxXZAwA bye
R_3G2sp6TEXZmf2KI hello
CodePudding user response:
df.apply(lambda x : x[x.last_valid_index()], 1)
More methods: https://stackoverflow.com/a/46520070/8170215