I have a dataframe like this which is having ROW_ID and Matches columns. Based on the value in each row of Matches column I should write in Result column. For example, in first row, we have ; ALL MATCH -3, so in the new column Result, this ; ALL MATCH should be present in ROW_ID 3. In 8th ROW_ID, we have ; ALL MATCH -9; Diff in# -10. So in our Result column ; ALL MATCH should be present in ROW_ID 9 and ; Diff in# should be present in ROW_ID 10
ROW_ID | Matches |
---|---|
1 | ; ALL MATCH -3 |
2 | |
3 | |
4 | |
5 | ; ALL MATCH -6 |
6 | |
7 | |
8 | ; ALL MATCH -9; Diff in# -10 |
9 | |
10 |
That means the final dataframe should be like this.
ROW_ID | Result |
---|---|
1 | |
2 | |
3 | ; ALL MATCH |
4 | |
5 | |
6 | ; ALL MATCH |
7 | |
8 | |
9 | ; ALL MATCH |
10 | ; Diff in# |
I tried a lot, I extracted the int value seperately and other parts separately for each row using dataframe.iterrows(). But I am not able to print that value to a particular position. df.at[] method won't work. loc and iloc also i tried, but not getting how can i print that string to particular row of that column.
CodePudding user response:
Try:
df['Result'] = df['ROW_ID'].map(
df['Matches'].str.extractall('(; [^-] ) -(\d )')
.astype({1: int}).set_index(1).squeeze()
).fillna('')
Output:
>>> df
ROW_ID Matches Result
0 1 ; ALL MATCH -3
1 2
2 3 ; ALL MATCH
3 4
4 5 ; ALL MATCH -6
5 6 ; ALL MATCH
6 7
7 8 ; ALL MATCH -9; Diff in# -10
8 9 ; ALL MATCH
9 10 ; Diff in#
# Details about extractall
>>> df['Matches'].str.extractall('(; [^-] ) -(\d )')
0 1
match
0 0 ; ALL MATCH 3
4 0 ; ALL MATCH 6
7 0 ; ALL MATCH 9
1 ; Diff in# 10
CodePudding user response:
Create a temporary DataFrame as:
wrk = df.Matches.str.extractall(r'(?P<Result>;\D )-(?P<id>\d )')
Then strip the trailing spaces from Result column:
wrk.Result = wrk.Result.str.strip()
The next step is to change the type of id column to int, as so far it is of object type (actually a string):
wrk.id = wrk.id.astype('int64')
and set it as the index:
wrk.set_index('id', inplace=True)
Now wrk is actually a Series, containing:
Result
id
3 ; ALL MATCH
6 ; ALL MATCH
9 ; ALL MATCH
10 ; Diff in#
Then, to generate the result, run:
res = df.merge(wrk, how='left', left_on='ROW_ID', right_index=True)
The result is:
ROW_ID Matches Result
0 1 ; ALL MATCH -3 NaN
1 2 NaN NaN
2 3 NaN ; ALL MATCH
3 4 NaN NaN
4 5 ; ALL MATCH -6 NaN
5 6 NaN ; ALL MATCH
6 7 NaN NaN
7 8 ; ALL MATCH -9; Diff in# -10 NaN
8 9 NaN ; ALL MATCH
9 10 NaN ; Diff in#
If you don't want "NaN" in "not filled" fields, append .fillna('')
to the last instruction.