I am working on a school project, so please no exact answers. I have a pandas dataframe that has numerators and denominators rating images of dogs out of 10. When there are multiple dogs in the image, the rating is out of number of dogs * 10. I am trying to adjust it so that for example... if there are 5 dogs, and the rating is 40/50, then the new numerator/denominator is 8/10. Here is an example of my code. I am aware that the syntax does not work in line 3, but I believe it accurately represents what I am trying to accomplish. twitter_archive is the dataframe.
twitter_archive['new_denom'] = 10
twitter_archive['new_numer'] = 0
for numer, denom in twitter_archive['rating_numerator','rating_denominator']:
if (denom > 10) & (denom % 10 == 0):
num_denom = denom / 10
new_numer = numer / num_denom
twitter_archive['new_numer'] = new_numer
So basically I am checking the denominator if it is above 10, and if it is, is it divisible by 10? if it is, then find out how many times 10 goes into it, and then divide the numerator by that value to get an new numerator. I think my logic for that works fine, but the issue I have is grabbing that row, and then adding that new value to the new column I created, in that row. edit: added df head
tweet_id | timestamp | text | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | avg_numerator | avg_denom | avg_numer | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8.924206e 17 | 2017-08-01 16:23:56 00:00 | This is Phineas. He's a mystical boy. Only eve... | 13.0 | 10.0 | phineas | None | None | None | None | 0.0 | 10 | 0 |
1 | 8.921774e 17 | 2017-08-01 00:17:27 00:00 | This is Tilly. She's just checking pup on you.... | 13.0 | 10.0 | tilly | None | None | None | None | 0.0 | 10 | 0 |
2 | 8.918152e 17 | 2017-07-31 00:18:03 00:00 | This is Archie. He is a rare Norwegian Pouncin... | 12.0 | 10.0 | archie | None | None | None | None | 0.0 | 10 | 0 |
3 | 8.916896e 17 | 2017-07-30 15:58:51 00:00 | This is Darla. She commenced a snooze mid meal... | 13.0 | 10.0 | darla | None | None | None | None | 0.0 | 10 | 0 |
4 | 8.913276e 17 | 2017-07-29 16:00:24 00:00 | This is Franklin. He would like you to stop ca... | 12.0 | 10.0 | franklin | None | None | None | None | 0.0 | 10 | 0 |
copy/paste head below:
{'tweet_id': {0: 8.924206435553362e 17,
1: 8.921774213063434e 17,
2: 8.918151813780849e 17,
3: 8.916895572798587e 17,
4: 8.913275589266883e 17},
'timestamp': {0: Timestamp('2017-08-01 16:23:56 0000', tz='UTC'),
1: Timestamp('2017-08-01 00:17:27 0000', tz='UTC'),
2: Timestamp('2017-07-31 00:18:03 0000', tz='UTC'),
3: Timestamp('2017-07-30 15:58:51 0000', tz='UTC'),
4: Timestamp('2017-07-29 16:00:24 0000', tz='UTC')},
'text': {0: "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 ",
1: "This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 ",
2: 'This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 ',
3: 'This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us ',
4: 'This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek '},
'rating_numerator': {0: 13.0, 1: 13.0, 2: 12.0, 3: 13.0, 4: 12.0},
'rating_denominator': {0: 10.0, 1: 10.0, 2: 10.0, 3: 10.0, 4: 10.0},
'name': {0: 'phineas', 1: 'tilly', 2: 'archie', 3: 'darla', 4: 'franklin'},
'doggo': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'},
'floofer': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'},
'pupper': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'},
'puppo': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'}}
CodePudding user response:
If you want to use for
loop to get row values, you can use iterrows()
function.
for idx, row in twitter_archive.iterrows():
denom = row['rating_denominator']
numer = row['rating_numerator']
# You can add values in list and concat it with df
But better way is to use pandas apply function to create new col from old one.
df = pd.DataFrame(data={'a' : [1,2], 'b': [3,5]})
df['c'] = df.apply(lambda x: 'sum_is_odd' if (x['a'] x['b']) % 2 == 1 else 'sum_is_even', axis=1)
In this case, 'c' is a new column and value is calculated using 'a' and 'b' columns.