Home > database >  map select rows to a new column
map select rows to a new column

Time:11-11

I have a data frame that looks like this

table = {'0': {6: 'Becks', 7: '307NRR', 8: '321NRR', 9: '342NRR', 10: 'Campbell', 11: '329NRR', 12: '347NRR', 13: 'Crows', 14: 'C3001R'}, '1': {6: nan, 7: 'R', 8: 'R', 9: 'R', 10: nan, 11: 'R', 12: 'R', 13: nan, 14: 'R'}, '2': {6: nan, 7: 'CM,SG', 8: 'CM,SG', 9: 'CM,SG', 10: nan, 11: 'None', 12: 'None', 13: nan, 14: 'None'}, '3': {6: nan, 7: 3.0, 8: 3.2, 9: 3.4, 10: nan, 11: 3.2, 12: 3.4, 13: nan, 14: 3.0}}
           0    1      2    3
6      Becks  NaN    NaN  NaN
7     307NRR    R  CM,SG  3.0
8     321NRR    R  CM,SG  3.2
9     342NRR    R  CM,SG  3.4
10  Campbell  NaN    NaN  NaN
11    329NRR    R   None  3.2
12    347NRR    R   None  3.4
13     Crows  NaN    NaN  NaN
14    C3001R    R   None  3.0

I want to use rows with names Becks, Campbell, Crows as a seperate column to name the entries below them. So the result would look like:

Becks       307NRR    R  CM,SG  3.0
Becks       321NRR    R  CM,SG  3.2
Becks       342NRR    R  CM,SG  3.4
Campbell    329NRR    R   None  3.2
Campbell    347NRR    R   None  3.4
Crows       C3001R    R   None  3.0

What's a good way to do with with panda/base modules?

CodePudding user response:

Try:

# your header rows
headers = df.iloc[:, 1:].isna().all(1)

# propagate the headers to the rows below it
df['name'] = df.groupby(headers.cumsum())['0'].transform('first')
# also
# df['name'] = df.iloc[:,0].where(headers).ffill()

# drop the header rows
df = df[~headers]

Output:

         0  1      2    3      name
7   307NRR  R  CM,SG  3.0     Becks
8   321NRR  R  CM,SG  3.2     Becks
9   342NRR  R  CM,SG  3.4     Becks
11  329NRR  R   None  3.2  Campbell
12  347NRR  R   None  3.4  Campbell
14  C3001R  R   None  3.0     Crows

Note if you really care about column order, instead of df['name'] = ... you can use insert to match your expected output:

df.insert(0, 'name', df.iloc[:,0].where(headers).ffill())

And you get:

        name       0  1      2    3
7      Becks  307NRR  R  CM,SG  3.0
8      Becks  321NRR  R  CM,SG  3.2
9      Becks  342NRR  R  CM,SG  3.4
11  Campbell  329NRR  R   None  3.2
12  Campbell  347NRR  R   None  3.4
14     Crows  C3001R  R   None  3.0

CodePudding user response:

Lets Try

df['new_co'] =df['0'].shift()
        
df=df.assign(new_co=np.where(df['new_co'].str.contains('RR'),np.nan,df['new_co'])).dropna(thresh=3).ffill()



        0  1      2    3    new_co
7   307NRR  R  CM,SG  3.0     Becks
8   321NRR  R  CM,SG  3.2     Becks
9   342NRR  R  CM,SG  3.4     Becks
11  329NRR  R   None  3.2  Campbell
12  347NRR  R   None  3.4  Campbell
14  C3001R  R   None  3.0     Crows
  • Related