Home > Enterprise >  pandas loc doesn't change every column value in given range
pandas loc doesn't change every column value in given range

Time:05-26

I'm trying to use pandas' loc to change some column's value.

I have a main df which has about 200k rows with the following structure: [col1, col2, col3, col4, col5].

I need to change some of the values of col4 and col5 based on the number of rows with value val. In pseudocode would something like this:

for each row in dataframe:
    if col2 == value:
        then col4 and col5 change its value

I made a method to create smaller dataframes of each col2 values to work with them to change col4 and col5 and then concatenate them. In this smaller dataframe I'm using pandas loc like that:

smaller_df.loc[range_to_change_col4, col4] = new_col4_value
smaller_df.loc[range_to_change_col5, col5] = new_col5_value

Data sample:

Original ->
class;id;url;aug;iterations
image_class;1;image_url;0;0
Expected ->
class;id;url;aug;iterations
image_class;1;image_url;1;1

Code sample:

# Number of images I need to augment / 
# number of images I already have
if images_to_add / df.shape[0] < 1:
        # Random index' rows
        to_update = df.sample(
            n = to_add, # number of images I need to create
            replace=True, 
            random_state=1
            ).index
        # real image will be augmented
        df.loc[to_update,'aug'] = 1
        # How many times real image will be augmented 
        df.loc[to_update,'iterations'] = 1

My problem is that not in every smaller df all rows update its value. I'm relatively new to pandas and I don't what's the problem. Maybe memory problem? Any idea about how could I avoid this?

CodePudding user response:

I would use assign, like:

.assign(new_colum_name=lambda x: x["colom name"] * 1.3,
    new_colum_name=lambda x: x["colom name"]   x["colom name"],
    new_colum_name=lambda x: x["colom name"].str.replace("2", "3")   x["colom name"]),
    new_colum_name=lambda x: np.where(x["Voorraad"] > 8, 8, x["Voorraad"]),)

CodePudding user response:

SOLVED: in df.sample the problem replace parameter, setting it to false solved the problem and now I can change every value.

  • Related