How to write the values to another column of dataframe based on Row_id column and value exist in mat-CodePudding

I have a dataframe like this which is having ROW_ID and Matches columns. Based on the value in each row of Matches column I should write in Result column. For example, in first row, we have ; ALL MATCH -3, so in the new column Result, this ; ALL MATCH should be present in ROW_ID 3. In 8th ROW_ID, we have ; ALL MATCH -9; Diff in# -10. So in our Result column ; ALL MATCH should be present in ROW_ID 9 and ; Diff in# should be present in ROW_ID 10

ROW_ID	Matches
1	; ALL MATCH -3
2
3
4
5	; ALL MATCH -6
6
7
8	; ALL MATCH -9; Diff in# -10
9
10

That means the final dataframe should be like this.

ROW_ID	Result
1
2
3	; ALL MATCH
4
5
6	; ALL MATCH
7
8
9	; ALL MATCH
10	; Diff in#

I tried a lot, I extracted the int value seperately and other parts separately for each row using dataframe.iterrows(). But I am not able to print that value to a particular position. df.at[] method won't work. loc and iloc also i tried, but not getting how can i print that string to particular row of that column.

CodePudding user response：

Try:

df['Result'] = df['ROW_ID'].map(
    df['Matches'].str.extractall('(; [^-] ) -(\d )')
                 .astype({1: int}).set_index(1).squeeze()
).fillna('')

Output:

>>> df
   ROW_ID                       Matches       Result
0       1                ; ALL MATCH -3             
1       2                                           
2       3                                ; ALL MATCH
3       4                                           
4       5                ; ALL MATCH -6             
5       6                                ; ALL MATCH
6       7                                           
7       8  ; ALL MATCH -9; Diff in# -10             
8       9                                ; ALL MATCH
9      10                                 ; Diff in#

# Details about extractall
>>> df['Matches'].str.extractall('(; [^-] ) -(\d )')
                   0   1
  match                 
0 0      ; ALL MATCH   3
4 0      ; ALL MATCH   6
7 0      ; ALL MATCH   9
  1       ; Diff in#  10

CodePudding user response：

Create a temporary DataFrame as:

wrk = df.Matches.str.extractall(r'(?P<Result>;\D )-(?P<id>\d )')

Then strip the trailing spaces from Result column:

wrk.Result = wrk.Result.str.strip()

The next step is to change the type of id column to int, as so far it is of object type (actually a string):

wrk.id = wrk.id.astype('int64')

and set it as the index:

wrk.set_index('id', inplace=True)

Now wrk is actually a Series, containing:

         Result
id             
3   ; ALL MATCH
6   ; ALL MATCH
9   ; ALL MATCH
10   ; Diff in#

Then, to generate the result, run:

res = df.merge(wrk, how='left', left_on='ROW_ID', right_index=True)

The result is:

   ROW_ID                       Matches       Result
0       1                ; ALL MATCH -3          NaN
1       2                           NaN          NaN
2       3                           NaN  ; ALL MATCH
3       4                           NaN          NaN
4       5                ; ALL MATCH -6          NaN
5       6                           NaN  ; ALL MATCH
6       7                           NaN          NaN
7       8  ; ALL MATCH -9; Diff in# -10          NaN
8       9                           NaN  ; ALL MATCH
9      10                           NaN   ; Diff in#

If you don't want "NaN" in "not filled" fields, append .fillna('') to the last instruction.