Home > Blockchain >  Loop through the columns of a Dataframe to modify a slice of it
Loop through the columns of a Dataframe to modify a slice of it

Time:10-01

I Have the following dataframe and I am trying to modify a slice of it by iterating through the columns using a for loop.

data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}

df = pd.DataFrame(data)

for col in df:
    if col =='id':
        continue
    else:
        df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]

But I am experiencing something strange where after running the loop, it seems like it doesn't affect all the columns. I am expecting this:

#Expected

pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece', 'New York'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens', 'Albany']})

But I am getting this instead:

pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']})

CodePudding user response:

Don't use for loop unless you have to do, plus the line

df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]

is problematic because on the left hand side you restrict the dataframe to a specific rows while on the right hand side you work with the whole rows on a specific column, use apply instead:

valid_cols = df.columns.drop('id')
df.loc[df['employment']=='12-15',valid_cols] =\ 
                                      df.loc[df['employment']=='12-15',valid_cols].\ 
                                      apply(lambda x: x.str.split('-').str[0])

CodePudding user response:

If you are interested to know what wrong is there in your code then follow below one;

Issue actually was in the loop... It updates the 'employment' column too before updating last 2 columns... & when it comes to the their turn then their is actually no values of '12-15' in employement column as they got updated to '12'... So, just changing the order of loop of columns in your code will solve the problem, where 'employment' will update at the end...

lst_cols = list(df.columns)
lst_cols.remove('employment')
lst_cols =  lst_cols   ['employment']

for col in lst_cols:
    if col =='id':
        continue
    else:
        df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
  • Related