Reproducible dataframe-
import pandas as pd
data = {'refid': ['1.2.34',
'1.2.35',
'1.3.66',
'1.6.99',
'1.9.00',
'1.87.66',
'1.98.00',
'1.100.1',
'1.101.3'],
}
my_index = pd.MultiIndex.from_arrays([["A"]*6 ["B"]*3, [1, 1, 1, 2, 2, 2, 1, 1, 1]], names=["ID-A", "ID-B"])
df = pd.DataFrame(data, index=my_index)
I want a new column which clubs both ID-B and refid to second delimiter. ex- for ID-B 1 and refid 1.2.34, first the secondary-refid column should be 1.2 and unique ID should be 1_1.2
CodePudding user response:
You can use get_level_values
with str.extract
and concatenate the values converted as string:
df['new'] = (df.index.get_level_values('ID-B').astype(str) '_'
df['refid'].str.extract('(\d \.\d )', expand=False)
)
output:
refid new
ID-A ID-B
A 1 1.2.34 1_1.2
1 1.2.35 1_1.2
1 1.3.66 1_1.3
2 1.6.99 2_1.6
2 1.9.00 2_1.9
2 1.87.66 2_1.87
B 1 1.98.00 1_1.98
1 1.100.1 1_1.100
1 1.101.3 1_1.101