I have a Dataset like this:
dictionary = {'Month1': ['C1','C2',0,0,'C5'], 'Month2': ['C1','C2','C3','C4',0], 'Month3': ['C1','C2','C3','C4',0], 'Month4' : [0,'C2','C3',0,0]}
df = pd.DataFrame(dictionary)
Month1 Month2 Month3 Month4
0 C1 C1 C1 0
1 C2 C2 C2 C2
2 0 C3 C3 C3
3 0 C4 C4 0
4 C5 0 0 0
I want to compare each row between all the columns of this DataFrame, i.e. compare if df.loc[0,'Month1']
is equal to: df.loc[0,'Month2']
, df.loc[0,'Month3']
and df.loc[0,'Month2']
. I want to do this because my goal with this comparison is to know from and to what month there is the same string, I would like to put the initial month that the value appear in one column and the last month that the string is repeated in other one. Someone like this:
Month1 Month2 Month3 Month4 Firts appear last appear
0 C1 C1 C1 0 Month1 Month3
1 C2 C2 C2 C2 Month1 Month4
2 0 C3 C3 C3 Month2 Month4
3 0 C4 C4 0 Month2 Month3
4 C5 0 0 0 Month1 Month1
I'm thinking in uses np.where()
or a loop but I really don't know how to do it. Please help me
CodePudding user response:
Let us do
s = df.ne(0).dot(df.columns ',').str[:-1].str.split(',')
df['1st'] = s.str[0]
df['-1st'] = s.str[-1]
Notice you can also do 2nd
df['2nd'] = s.str[1]
CodePudding user response:
Use idxmax
>>> df.assign(first_appear=df.ne(0).idxmax(1),
last_appear=df.loc[:, df.columns[::-1]].ne(0).idxmax(1))
Month1 Month2 Month3 Month4 Firts appear last appear
0 C1 C1 C1 0 Month1 Month3
1 C2 C2 C2 C2 Month1 Month4
2 0 C3 C3 C3 Month2 Month4
3 0 C4 C4 0 Month2 Month3
4 C5 0 0 0 Month1 Month1
CodePudding user response:
Another solution:
df[["First appear", "Last appear"]] = df.apply(
lambda x: [(idx := x.index[x.ne(0)])[0], idx[-1]],
axis=1,
result_type="expand",
)
print(df)
Prints:
Month1 Month2 Month3 Month4 First appear Last appear
0 C1 C1 C1 0 Month1 Month3
1 C2 C2 C2 C2 Month1 Month4
2 0 C3 C3 C3 Month2 Month4
3 0 C4 C4 0 Month2 Month3
4 C5 0 0 0 Month1 Month1
CodePudding user response:
You can do:
s = (df[df.ne(0)].apply(lambda row:
(row.first_valid_index(), row.last_valid_index()), axis=1))
df['First Appear'], df['Last Appear'] = s.str[0], s.str[1]
Output:
Month1 Month2 Month3 Month4 First Appear Last Appear
0 C1 C1 C1 0 Month1 Month3
1 C2 C2 C2 C2 Month1 Month4
2 0 C3 C3 C3 Month2 Month4
3 0 C4 C4 0 Month2 Month3
4 C5 0 0 0 Month1 Month1