compare strings in the same row but in different columns and catch in which column they are not equa-CodePudding

I have a Dataset like this:

dictionary = {'Month1': ['C1','C2',0,0,'C5'], 'Month2': ['C1','C2','C3','C4',0], 'Month3': ['C1','C2','C3','C4',0], 'Month4' : [0,'C2','C3',0,0]}
df = pd.DataFrame(dictionary)


    Month1  Month2  Month3  Month4
0   C1        C1     C1       0
1   C2        C2     C2       C2
2   0         C3     C3       C3
3   0         C4     C4        0
4   C5         0      0        0

I want to compare each row between all the columns of this DataFrame, i.e. compare if df.loc[0,'Month1'] is equal to: df.loc[0,'Month2'], df.loc[0,'Month3'] and df.loc[0,'Month2']. I want to do this because my goal with this comparison is to know from and to what month there is the same string, I would like to put the initial month that the value appear in one column and the last month that the string is repeated in other one. Someone like this:

    Month1  Month2  Month3  Month4  Firts appear last appear
0   C1        C1     C1        0        Month1      Month3
1   C2        C2     C2       C2        Month1      Month4
2   0         C3     C3       C3        Month2      Month4
3   0         C4     C4        0        Month2      Month3
4   C5         0      0        0        Month1      Month1

I'm thinking in uses np.where() or a loop but I really don't know how to do it. Please help me

CodePudding user response：

Let us do

s = df.ne(0).dot(df.columns ',').str[:-1].str.split(',')
df['1st'] = s.str[0]
df['-1st'] = s.str[-1]

Notice you can also do 2nd

df['2nd'] = s.str[1]

CodePudding user response：

Use idxmax

>>> df.assign(first_appear=df.ne(0).idxmax(1),
              last_appear=df.loc[:, df.columns[::-1]].ne(0).idxmax(1))

    Month1  Month2  Month3  Month4  Firts appear last appear
0   C1        C1     C1        0        Month1      Month3
1   C2        C2     C2       C2        Month1      Month4
2   0         C3     C3       C3        Month2      Month4
3   0         C4     C4        0        Month2      Month3
4   C5         0      0        0        Month1      Month1

CodePudding user response：

Another solution:

df[["First appear", "Last appear"]] = df.apply(
    lambda x: [(idx := x.index[x.ne(0)])[0], idx[-1]],
    axis=1,
    result_type="expand",
)
print(df)

Prints:

  Month1 Month2 Month3 Month4 First appear Last appear
0     C1     C1     C1      0       Month1      Month3
1     C2     C2     C2     C2       Month1      Month4
2      0     C3     C3     C3       Month2      Month4
3      0     C4     C4      0       Month2      Month3
4     C5      0      0      0       Month1      Month1

CodePudding user response：

You can do:

s = (df[df.ne(0)].apply(lambda row: 
        (row.first_valid_index(), row.last_valid_index()), axis=1))
df['First Appear'], df['Last Appear'] = s.str[0], s.str[1]

Output:

  Month1 Month2 Month3 Month4 First Appear Last Appear
0     C1     C1     C1      0       Month1      Month3
1     C2     C2     C2     C2       Month1      Month4
2      0     C3     C3     C3       Month2      Month4
3      0     C4     C4      0       Month2      Month3
4     C5      0      0      0       Month1      Month1