Home > Enterprise >  How to split a column and add additional rows from the split values in pandas?
How to split a column and add additional rows from the split values in pandas?

Time:07-26

I have a dataframe as:

{'last_name': {0: 'Acosta-Arriola',
  1: 'Afragola',
  2: 'Bertolini',
  3: 'Coyle',
  4: 'Davis',
  10: 'Duntz',
  11: 'Eastman',
  12: 'Fitzgerald',
  13: 'Fitzgerald',
  14: 'Freeman',
  15: 'Freeman',
  16: 'Gambardella',
  17: 'Kelleher',
  18: 'King',
  19: 'Looney',
  20: 'Mccann',
  21: 'Murray',
  22: 'Palmeri',
  23: 'Powers',
  24: 'Vitelli',
  25: 'Wyzykowski'},
 'first_name_or_initial': {0: 'Jose',
  1: 'Sarah',
  2: 'Peter',
  3: 'James',
  4: 'Albert',
  10: 'Shawn',
  11: 'Bryan',
  12: 'Richard',
  13: 'Richard',
  14: 'Matthew',
  15: 'Matthew',
  16: 'Vincent',
  17: 'Robert',
  18: 'Thomas',
  19: 'Ray',
  20: 'Joseph',
  21: 'Joshua',
  22: 'Randy',
  23: 'Dennis',
  24: 'Robert',
  25: 'John'},'middle_name_or_initial': {0: 'Lusi;Luis',
  1: 'R.;B.',
  2: 'M.;Mario',
  3: 'M.;Michael',
  4: 'Chadbourne;C.',
  10: 'R.;Richard',
  11: 'J.;James',
  12: 'M.;J.;Micha',
  13: 'M.;Michael',
  14: 'Christopher;Robert',
  15: 'Christopher;C.',
  16: 'A.;Anthony',
  17: 'S.;Steven',
  18: 'E.;Emory',
  19: 'S.;Scott',
  20: 'M.;Michael',
  21: 'M.;P.',
  22: 'T.;Thomas',
  23: 'E.;Edward',
  24: 'J.;D.',
  25: 'J.;James'},
 'Suffix': {0: '',
  1: '',
  2: '',
  3: '',
  4: 'Jr.',
  10: '',
  11: '',
  12: '',
  13: '',
  14: '',
  15: '',
  16: '',
  17: '',
  18: 'Jr.',
  19: 'Jr.',
  20: '',
  21: '',
  22: '',
  23: 'Jr.',
  24: '',
  25: ''},
 'address_1': {0: '',
  1: '51 Indigo Trail',
  2: '90 Cherry Street;1295 Great Hill Road;90 Cherry Street',
  3: '51 Canary Court;51 Canary Court;687 Main Street',
  4: '39 Hemenway Street',
  10: '118 Brookside Avenue;9886 171 Street Place',
  11: '616 East Main Street;989 Boston Post Road;38 Mallard Court;1421 Naugatuck Avenue',
  12: '',
  13: '18 Fox Ridge Lane;18 Fox Ridge;18 Fox Ridge Road',
  14: '',
  15: '',
  16: '45 Jakobs Landing',
  17: '171 Williams Road;181 Knob Hill Road',
  18: '31 Millwood Drive;31 Millwood Drive;41 Waverly Park Road;31 Millwood Drive;25 Crouch Road',
  19: '',
  20: '17 Pheasant Run;25 Mcdermott Road;PO Box 510;17 Pheasant Run',
  21: '42 Seymour Street;42 Seymour Stt',
  22: '205 Mccall Road',
  23: '204 Milton Avenue;187 Milton Avenue',
  24: '16 Montgomery Drive',
  25: '457 Hill Street;139 County Line Road'}}

enter image description here

Here i would like to split a column middle_name using delimeter semicolon ';'.

after splitting i would like to have a additional rows as many spitted words as existed.

for example:

Duntz   Shawn   R.;Richard 118 Brookside Avenue;9886 171 Street Place

should be

1. Duntz - Shawn - R. - 118 Brookside Avenue;9886 171 Street Place

2. Duntz - Shawn - Richard - 118 Brookside Avenue;9886 171 Street Place

CodePudding user response:

# split the middle name
df.middle_name_or_initial = df.middle_name_or_initial.str.split(';')

# explode the dataframe
df_new = df.explode('middle_name_or_initial')

here is the documentation of df.explode() doc

  • Related