i'm very confused on how to do this, (i'm very newbie yet) and I need to convert this dataframe into a dictionary with a column for repeated values:
import pandas as pd
df = pd.DataFrame({'Name': [['John', 'hock'], ['John','pepe'],['Peter', 'wdw'],['Peter'],['John'], ['Stef'], ['John']],
'Age': [38, 47, 63, 28, 33, 45, 66]
})
and i need something like:
Name Age Repeated:
John 38 4
thanks!
CodePudding user response:
Use DataFrame.explode
with GroupBy.size
:
df = df.explode('Name').groupby(['Name']).size().reset_index(name='Repeated')
print (df)
Name Repeated
0 John 4
1 Peter 2
2 Stef 1
3 hock 1
4 pepe 1
5 wdw 1
CodePudding user response:
I can think of something like:
resultDict = {}
for index, row in df.iterrows():
for value in row["Name"]:
if value not in resultDict:
resultDict[value] = 0
resultDict[value] = 1
resultDict
Output
{'John': 4, 'Peter': 2, 'Stef': 1, 'hock': 1, 'pepe': 1, 'wdw': 1}
If you want to have it as a dataframe and not a dictionary:
resultDict = {}
for index, row in df.iterrows():
for value in row["Name"]:
if value not in resultDict:
resultDict[value] = 0
resultDict[value] = 1
pd.DataFrame({"Name":resultDict.keys(), "Repeated":resultDict.values()})
Output
Name | Repeated |
---|---|
John | 4 |
hock | 1 |
pepe | 1 |
Peter | 2 |
wdw | 1 |
Stef | 1 |