I am looking to calculate the vote share for a candidate in a particular district for a particular election.
I've got a dataset that gives me the party of the candidate, the district that they ran in, and the year of the election.
However, I have their competitors as well and want to calculate the vote share for one of the candidates, say the Democratic candidate. How would I then place that vote share value in the rows for all of those in that particular race?
Here's an example of a pandas dataframe that I have:
year state district candidate party candidatevotes totalvotes
0 1976 alabama 1 BILL DAVENPORT DEMOCRAT 58906 157170
1 1976 alabama 1 JACK EDWARDS REPUBLICAN 98257 157170
2 1976 alabama 1 WRITEIN NaN 7 157170
3 1976 alabama 2 J CAROLE KEAHEY DEMOCRAT 66288 156362
4 1976 alabama 2 WILLIAM L "BILL" DICKINSON REPUBLICAN 90069 156362
What I want to be able to do is have a column that would take the vote share of the democratic candidate for that particular district and year. But that vote share value be there for all of the candidates in that particular race:
year state district candidate party candidatevotes totalvotes Dem_vote_share
0 1976 alabama 1 BILL DAVENPORT DEMOCRAT 58906 157170 0.374
1 1976 alabama 1 JACK EDWARDS REPUBLICAN 98257 157170 0.374
2 1976 alabama 1 WRITEIN NaN 7 157170 0.374
3 1976 alabama 2 J CAROLE KEAHEY DEMOCRAT 66288 156362 0.424
4 1976 alabama 2 WILLIAM L "BILL" DICKINSON REPUBLICAN 90069 156362 0.424
If helpful: a full dataset can be found here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2
CodePudding user response:
You should use transform()
with a custom function. You can try with:
df['Dem_votes_share'] = df.groupby('party')['candidatevotes'].transform(lambda x: sum(x)/df['candidatevotes'].sum())
Following an mock-up example:
df = pd.DataFrame()
df['party'] = ['D','R','NULL','D','R']
df['candidatevotes'] = [58906,98257,7,66288,90069]
df['candidate'] = ['Bill','Jack','Writein','J Carole','William L']
We get:
party candidatevotes candidate Dem_votes_share
0 D 58906 Bill 0.399309
1 R 98257 Jack 0.600669
2 NULL 7 Writein 0.000022
3 D 66288 J Carole 0.399309
4 R 90069 William L 0.600669
CodePudding user response:
Filter your df for the kind of party you want. Then get the Dem_votes_share
and map its results to the df.
New Input:
df = pd.DataFrame({
'year': [1976, 1976, 1980, 1980, 1976, 1976, 1980, 1980],
'state': ['alabama', 'alabama', 'alaska', 'alaska', 'alabama', 'alabama', 'alaska', 'alaska'],
'district': [1, 1, 1, 1, 2, 2, 2, 2],
'candidate': ['BILL DAVENPORT', 'JACK EDWARDS', 'WRITEIN', 'foo', 'J CAROLE KEAHEY', 'WILLIAM L "BILL" DICKINSON', 'foo2', 'foo3'],
'party': ['DEMOCRAT', 'REPUBLICAN', 'DEMOCRAT', 'REPUBLICAN', 'DEMOCRAT', 'REPUBLICAN', 'DEMOCRAT', 'REPUBLICAN'],
'candidatevotes': [58906, 98257, 50000, 20000, 66288, 90069, 10000, 20000],
'totalvotes': [157163, 157163, 70000, 70000, 156362, 156362, 30000, 30000]
})
year state district candidate party candidatevotes totalvotes
0 1976 alabama 1 BILL DAVENPORT DEMOCRAT 58906 157163
1 1976 alabama 1 JACK EDWARDS REPUBLICAN 98257 157163
2 1980 alaska 1 WRITEIN DEMOCRAT 50000 70000
3 1980 alaska 1 foo REPUBLICAN 20000 70000
4 1976 alabama 2 J CAROLE KEAHEY DEMOCRAT 66288 156362
5 1976 alabama 2 WILLIAM L "BILL" DICKINSON REPUBLICAN 90069 156362
6 1980 alaska 2 foo2 DEMOCRAT 10000 30000
7 1980 alaska 2 foo3 REPUBLICAN 20000 30000
mapping_vals = (
df[df['party'].eq('DEMOCRAT')]
.set_index(['year', 'district'])
.apply(lambda x: x['candidatevotes']/x['totalvotes'],axis=1)
)
print(mapping_vals)
year district
1976 1 0.374808
1980 1 0.714286
1976 2 0.423939
1980 2 0.333333
dtype: float64
df['Dem_vote_share'] = df.set_index(['year', 'district']).index.map(mapping_vals)
print(df)
Output:
year state district candidate party candidatevotes totalvotes Dem_vote_share
0 1976 alabama 1 BILL DAVENPORT DEMOCRAT 58906 157163 0.374808
1 1976 alabama 1 JACK EDWARDS REPUBLICAN 98257 157163 0.374808
2 1980 alaska 1 WRITEIN DEMOCRAT 50000 70000 0.714286
3 1980 alaska 1 foo REPUBLICAN 20000 70000 0.714286
4 1976 alabama 2 J CAROLE KEAHEY DEMOCRAT 66288 156362 0.423939
5 1976 alabama 2 WILLIAM L "BILL" DICKINSON REPUBLICAN 90069 156362 0.423939
6 1980 alaska 2 foo2 DEMOCRAT 10000 30000 0.333333
7 1980 alaska 2 foo3 REPUBLICAN 20000 30000 0.333333