I have a DataFrame
Type Numer master width
xyz 465_0 123 305
xyz 465_0 123 305
xyz 465_0 123 305
xyz 465_0 123 315
xyz 465_1 123 305
xyz 465_1 123 305
xyz 465_1 123 305
xyz 465_1 123 315
xyz 465_2 123 305
xyz 465_2 123 305
xyz 465_2 123 305
xyz 465_2 123 315
xyz 465_3 123 305
xyz 465_3 123 305
xyz 465_3 123 305
xyz 465_3 123 315
From this I need the following DataFrame
Type Numer master width
xyz 465_0 123 305
xyz 465_1 123 305
xyz 465_2 123 305
xyz 465_3 123 315
My try is:
df[['Numer1', 'dig']] = df['Numer'].str.split("_", expand=True)
df = df.drop('Numer', axis = 1)
df.drop_duplicates()
But it is not giving me the result. I would like to write it in a generic way, because I have this for multiple types.
Data:
{'Type': ['xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz',
'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'],
'Numer': ['465_0', '465_0', '465_0', '465_0', '465_1', '465_1', '465_1', '465_1',
'465_2', '465_2', '465_2', '465_2', '465_3', '465_3', '465_3', '465_3'],
'master': [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123],
'width': [305, 305, 305, 315, 305, 305, 305, 315, 305, 305, 305, 315, 305, 305, 305, 315]}
CodePudding user response:
We could use groupby
cumcount
to create an group-specific ranking for each "Numer"; then filter the rows where the suffix in "Numer" matches the ranks in groups:
out = df[df['Numer'].str.split('_').str[1].astype(int) == df.groupby('Numer').cumcount()].drop(columns='rank')
Output:
Type Numer master width
0 xyz 465_0 123 305
5 xyz 465_1 123 305
10 xyz 465_2 123 305
15 xyz 465_3 123 315
CodePudding user response:
What I understood from your question is that you are trying to "collapse" the width
pattern in each Numer
group into a DataFrame
which only has the unique group (Numer
) values and the same width
pattern
One way to do this is using the zip_longest
function from the itertools
library after reversing the groups -
from itertools import zip_longest
rev_zip_longest = list(zip_longest(*[reversed(df[col].unique()) for col in df.columns]))
# [('xyz', '465_3', 123, 315),
# (None, '465_2', None, 305),
# (None, '465_1', None, None),
# (None, '465_0', None, None)]
df2 = pd.DataFrame(rev_zip_longest)
df2.columns = df.columns
df2 = df2.fillna(method='ffill')
# Type Numer master width
# 0 xyz 465_3 123.0 315.0
# 1 xyz 465_2 123.0 305.0
# 2 xyz 465_1 123.0 305.0
# 3 xyz 465_0 123.0 305.0
CodePudding user response:
this code works as well:
res = pd.concat([g.take([i.split('_')[1]]) for i,g in df.groupby('Numer')])
print(res)
'''
Type Numer master width
0 xyz 465_0 123 305
5 xyz 465_1 123 305
10 xyz 465_2 123 305
15 xyz 465_3 123 315