Home > other >  compare strings in the same row but in different columns and catch in which column they are not equa
compare strings in the same row but in different columns and catch in which column they are not equa

Time:06-07

I have a Dataset like this:

dictionary = {'Month1': ['C1','C2',0,0,'C5'], 'Month2': ['C1','C2','C3','C4',0], 'Month3': ['C1','C2','C3','C4',0], 'Month4' : [0,'C2','C3',0,0]}
df = pd.DataFrame(dictionary)


    Month1  Month2  Month3  Month4
0   C1        C1     C1       0
1   C2        C2     C2       C2
2   0         C3     C3       C3
3   0         C4     C4        0
4   C5         0      0        0

I want to compare each row between all the columns of this DataFrame, i.e. compare if df.loc[0,'Month1'] is equal to: df.loc[0,'Month2'], df.loc[0,'Month3'] and df.loc[0,'Month2']. I want to do this because my goal with this comparison is to know from and to what month there is the same string, I would like to put the initial month that the value appear in one column and the last month that the string is repeated in other one. Someone like this:

    Month1  Month2  Month3  Month4  Firts appear last appear
0   C1        C1     C1        0        Month1      Month3
1   C2        C2     C2       C2        Month1      Month4
2   0         C3     C3       C3        Month2      Month4
3   0         C4     C4        0        Month2      Month3
4   C5         0      0        0        Month1      Month1

I'm thinking in uses np.where() or a loop but I really don't know how to do it. Please help me

CodePudding user response:

Let us do

s = df.ne(0).dot(df.columns ',').str[:-1].str.split(',')
df['1st'] = s.str[0]
df['-1st'] = s.str[-1]

Notice you can also do 2nd

df['2nd'] = s.str[1]

CodePudding user response:

Use idxmax

>>> df.assign(first_appear=df.ne(0).idxmax(1),
              last_appear=df.loc[:, df.columns[::-1]].ne(0).idxmax(1))

    Month1  Month2  Month3  Month4  Firts appear last appear
0   C1        C1     C1        0        Month1      Month3
1   C2        C2     C2       C2        Month1      Month4
2   0         C3     C3       C3        Month2      Month4
3   0         C4     C4        0        Month2      Month3
4   C5         0      0        0        Month1      Month1

CodePudding user response:

Another solution:

df[["First appear", "Last appear"]] = df.apply(
    lambda x: [(idx := x.index[x.ne(0)])[0], idx[-1]],
    axis=1,
    result_type="expand",
)
print(df)

Prints:

  Month1 Month2 Month3 Month4 First appear Last appear
0     C1     C1     C1      0       Month1      Month3
1     C2     C2     C2     C2       Month1      Month4
2      0     C3     C3     C3       Month2      Month4
3      0     C4     C4      0       Month2      Month3
4     C5      0      0      0       Month1      Month1

CodePudding user response:

You can do:

s = (df[df.ne(0)].apply(lambda row: 
        (row.first_valid_index(), row.last_valid_index()), axis=1))
df['First Appear'], df['Last Appear'] = s.str[0], s.str[1]

Output:

  Month1 Month2 Month3 Month4 First Appear Last Appear
0     C1     C1     C1      0       Month1      Month3
1     C2     C2     C2     C2       Month1      Month4
2      0     C3     C3     C3       Month2      Month4
3      0     C4     C4      0       Month2      Month3
4     C5      0      0      0       Month1      Month1
  • Related