tweet_id | image_url | doggo | floofer | puppa | puppo |
---|---|---|---|---|---|
First | row | none | none | none | none |
Second | row | doggo | none | none | none |
third | row | none | floofer | none | none |
fourth | row | none | none | puppa | none |
fifth | row | none | none | none | puppo |
I have a situation trying to melt the above df columns [doggo,floofer,puppa,puppo] headers which ought to be values. I want to be able to retain the rows count and also have a new_column as table below.
tweet_id | image_url | breed |
---|---|---|
First | row | none |
Second | row | doggo |
third | row | floofer |
fourth | row | puppa |
fifth | row | puppo |
I already tried the pd.melt() but it doesn't return desired result. Thank you
CodePudding user response:
I assume your none
is string.
df = df.replace('none', np.nan)
df = df.set_index(['tweet_id','image_url'])
output = df.apply(pd.Series.first_valid_index, axis=1).reset_index()
output.columns = ['tweet_id','image_url', 'breed']
output
###
tweet_id image_url breed
0 First row None
1 Second row doggo
2 third row floofer
3 fourth row puppa
4 fifth row puppo
CodePudding user response:
If you are not familiar with pandas function, you can use a nested loop to get the same result too.
df = df.replace('none', np.nan)
df = df.set_index(['tweet_id','image_url'])
result = []
for row in df.to_numpy():
if pd.isna(row).all():
result.append(np.nan)
for breed in row:
if pd.notna(breed):
result.append(breed)
break
[nan, 'doggo', 'floofer', 'puppa', 'puppo']
df['breed'] = result
df.reset_index(inplace=True)
tweet_id image_url doggo floofer puppa puppo breed
0 First row NaN NaN NaN NaN NaN
1 Second row doggo NaN NaN NaN doggo
2 third row NaN floofer NaN NaN floofer
3 fourth row NaN NaN puppa NaN puppa
4 fifth row NaN NaN NaN puppo puppo