I have a pandas dataframe that looks like this...
index | my_column |
---|---|
0 | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 |
What I need to do is conditionally assign values to 'my_column' depending on the index. The first three rows should have the values 'dog', 'cat', 'bird'. Then, the next three rows should also have 'dog', 'cat', 'bird'. That pattern should apply until the end of the dataset.
index | my_column |
---|---|
0 | dog |
1 | cat |
2 | bird |
3 | dog |
4 | cat |
5 | bird |
6 | dog |
I've tried the following code to no avail.
for index, row in df.iterrows():
counter=3
my_column='dog'
if counter>3
break
else
counter =1
my_column='cat'
counter =1
if counter>3
break
else
counter =1
my_column='bird'
if counter>3
break
CodePudding user response:
Create a dictionary:
pet_dict = {0:'dog',
1:'cat',
2:'bird'}
You can get the index value using the .name and modulus (%) function by 3 to get your desired result:
df.apply (lambda x: pet_dict[x.name%3],axis=1)
0 dog
1 cat
2 bird
3 dog
4 cat
5 bird
6 dog
7 cat
8 bird
9 dog
CodePudding user response:
Several problems:
- Your if syntax is incorrect, you are missing colons and proper indentation
- You are
break
ing out of your loop, terminating it early instead of using anif
,elif
,else
structure - You are trying to update your dataframe while iterating over it.
See this question about why you shouldn't update while you iterate.
Instead, you could do
values = ["dog", "cat", "bird"]
num_values = len(values)
for index in df.index():
df.at[index, "my_column"] = values[index % num_values]
CodePudding user response:
Advanced indexing
One solution would be to turn dog-cat-bird into a pd.Series
and use advanced indexing:
dcb = pd.Series(["dog", "cat", "bird"])
df["my_column"] = dcb[df.index % len(dcb)].reset_index(drop=True)
This works by first creating an index array from df.index % len(dcb)
:
In [8]: df.index % len(dcb)
Out[8]: Int64Index([0, 1, 2, 0, 1, 2, 0], dtype='int64')
Then, by using advanced indexing, you can select the elements from dcb
with that index array:
In [9]: dcb[df.index % len(dcb)]
Out[9]:
0 dog
1 cat
2 bird
0 dog
1 cat
2 bird
0 dog
dtype: object
Finally, notice that the index of the above array repeats. Reset it and drop the old index with .reset_index(drop=True)
, and finally assign to your dataframe.
Using a generator
Here's an alternate solution using an infinite dog-cat-bird generator:
In [2]: df
Out[2]:
my_column
0
1
2
3
4
5
6
In [3]: def dog_cat_bird():
...: while True:
...: yield from ("dog", "cat", "bird")
...:
In [4]: dcb = dog_cat_bird()
In [5]: df["my_column"].apply(lambda _: next(dcb))
Out[5]:
0 dog
1 cat
2 bird
3 dog
4 cat
5 bird
6 dog
Name: my_column, dtype: object