I have a dataframe say:
Example:
import pandas as pd
df = pd.DataFrame({'Item': ['California', '2012%', '2013%','Arizona','2012%',' 19','Janu%ary'],
'col1': [0,50, 50,0,10,11,14],'col2': [0, 50, 40,0,15,13,15]})
Output=
Item col1 col2
1 California 0 0
2 2012% 50 50
3 2013% 40 40
4 Arizona 0 0
5 2012%. 10. 15
6. 19. 11. 13
7. Janu%ary. 14. 15
I want the column names like " California" and "Arizona" (the ones that do not have "%" in the column values to be considered as Headers that has to be appended to their respective sub-headers. Like maybe iterate down the rows and find a pattern e.g. without ‘%’ in row means its a header, with ‘%’ means its a sub-header then for the ‘sub-header’ rows, add the last found ‘header’.
Expected output=
Item col1 col2
1 California 2012% 50 50
2 California 2013% 40 40
3 Arizona 2012%. 10. 15
4 Arizona 2019%. 11. 13
5 Arizona January%. 14. 15
CodePudding user response:
IIUC, you could use a mask and perform boolean masking/indexing:
# does the name contains '%' (you could use other conditions)
m = df['Item'].str.contains('%')
# mask and ffill the "header", then concatenate
df['Item'] = df['Item'].mask(m).ffill() ' ' df['Item']
# drop the former header rows
df = df.loc[m]
output:
Item col1 col2
1 California 2012% 50 50
2 California 2013% 50 40
4 Arizona 2012% 10 15
5 Arizona 2019% 11 13
6 Arizona January% 14 15
alternative to have a real index:
m = df['Item'].str.contains('%')
df['index'] = df['Item'].mask(m).ffill()
df = df.loc[m].set_index('index')
output:
Item col1 col2
index
California 2012% 50 50
California 2013% 50 40
Arizona 2012% 10 15
Arizona 2019% 11 13
Arizona January% 14 15