I have several of dataframes (df, tmp_df and sub_df) and I want to enter a column of tmp_df into a cell of sub_df as a list. My code and dataframes are shown as below. But the loop part is not working correctly:
import pandas as pd
df = pd.read_csv('myfile.csv')
tmp_df = pd.DataFrame()
sub_df = pd.DataFrame()
tmp_df = df[df['Type'] == True]
for c in tmp_df['Category']:
sub_df['Data'] , sub_df ['Category'], sub_df['Type'] = [list(set(tmp_df['Data']))],
tmp_df['Category'], tmp_df['Type']
df:
Data | Category | Type |
---|---|---|
30275 | A | True |
35881 | C | False |
28129 | C | True |
30274 | D | False |
30351 | D | True |
35886 | A | True |
39900 | C | True |
35887 | A | False |
35883 | A | True |
35856 | D | True |
35986 | C | False |
30350 | D | False |
28129 | C | True |
31571 | C | True |
tmp_df:
Data | Category | Type |
---|---|---|
30275 | A | True |
28129 | C | True |
30351 | D | True |
35886 | A | True |
39900 | C | True |
35883 | A | True |
35856 | D | True |
28129 | C | True |
31571 | C | True |
What should I do if I want the following result?
sub_df:
Data | Category | Type |
---|---|---|
[30275,35886,35883] | A | True |
[28129,39900,28129,31571] | C | True |
[30351,35856] | D | True |
CodePudding user response:
you can select the rows withquery
, then groupby
agg
:
(df.query('Type') # or 'Type == "True"' if strings
.groupby('Category', as_index=False)
.agg({'Data': list, 'Type': 'first'})
)
output:
Category Data Type
0 A [30275, 35886, 35883] True
1 C [28129, 39900, 28129, 31571] True
2 D [30351, 35856] True