Home > Back-end >  Assign value to column and reset after nth row
Assign value to column and reset after nth row

Time:11-23

I have a pandas dataframe that looks like this...

index my_column
0
1
2
3
4
5
6

What I need to do is conditionally assign values to 'my_column' depending on the index. The first three rows should have the values 'dog', 'cat', 'bird'. Then, the next three rows should also have 'dog', 'cat', 'bird'. That pattern should apply until the end of the dataset.

index my_column
0 dog
1 cat
2 bird
3 dog
4 cat
5 bird
6 dog

I've tried the following code to no avail.

for index, row in df.iterrows():
    counter=3
    my_column='dog'
    if counter>3
    break
    else 
    counter =1
    my_column='cat'
    counter =1
    if counter>3
    break
    else 
    counter =1
    my_column='bird'
    if counter>3
    break  

CodePudding user response:

Create a dictionary:

pet_dict = {0:'dog',
            1:'cat',
            2:'bird'}

You can get the index value using the .name and modulus (%) function by 3 to get your desired result:

df.apply (lambda x: pet_dict[x.name%3],axis=1)
0     dog
1     cat
2    bird
3     dog
4     cat
5    bird
6     dog
7     cat
8    bird
9     dog

CodePudding user response:

Several problems:

  1. Your if syntax is incorrect, you are missing colons and proper indentation
  2. You are breaking out of your loop, terminating it early instead of using an if, elif, else structure
  3. You are trying to update your dataframe while iterating over it.

See this question about why you shouldn't update while you iterate.

Instead, you could do

values = ["dog", "cat", "bird"]

num_values = len(values)

for index in df.index():
    df.at[index, "my_column"] = values[index % num_values]
    

CodePudding user response:

Advanced indexing

One solution would be to turn dog-cat-bird into a pd.Series and use advanced indexing:

dcb = pd.Series(["dog", "cat", "bird"])

df["my_column"] = dcb[df.index % len(dcb)].reset_index(drop=True)

This works by first creating an index array from df.index % len(dcb):

In [8]: df.index % len(dcb)
Out[8]: Int64Index([0, 1, 2, 0, 1, 2, 0], dtype='int64')

Then, by using advanced indexing, you can select the elements from dcb with that index array:

In [9]: dcb[df.index % len(dcb)]
Out[9]:
0     dog
1     cat
2    bird
0     dog
1     cat
2    bird
0     dog
dtype: object

Finally, notice that the index of the above array repeats. Reset it and drop the old index with .reset_index(drop=True), and finally assign to your dataframe.

Using a generator

Here's an alternate solution using an infinite dog-cat-bird generator:

In [2]: df
Out[2]:
  my_column
0
1
2
3
4
5
6

In [3]: def dog_cat_bird():
   ...:     while True:
   ...:         yield from ("dog", "cat", "bird")
   ...:

In [4]: dcb = dog_cat_bird()

In [5]: df["my_column"].apply(lambda _: next(dcb))
Out[5]:
0     dog
1     cat
2    bird
3     dog
4     cat
5    bird
6     dog
Name: my_column, dtype: object
  • Related