I am not sure how to best describe this (I am sure there is a more proper way of describing it).
I have a large dataset full of house details (eg. walls, bathrooms, bedrooms, etc.) that I need to analyze and rank based on their characteristics. I have created a ranking system with "4" being the best and "0" being the worst, for example, a house with 1 bedroom may get a "0" for their bedroom score but a house with a 3 bathrooms may get a "4" for their bathroom score.
Once I assocaite the ranks to all the characteristics, I plan on creating a weighted average to see which houses are the best.
How is the best way to do this? I need to do this about 20 times (for 20 characteristics) and so far this is the only way I know how to do it-- and it is quite tedious, especially if I ever need to go back and change anything.
Also, would be good to better understand how the df.loc function works, I was able to do make it work but I don't quite understand it.
#EXAMPLE ONE, GRADING LAND USE
ParcelsData.loc[ParcelsData["land_use"] == 'Flum/Swim Floodway (Restrected)', 'LandUseGrade'] = 0
ParcelsData.loc[ParcelsData["land_use"] == 'Single Family Residential', 'LandUseGrade'] = 4
ParcelsData.loc[ParcelsData["land_use"] == 'Wasteland, Slivers, Gullies, Rock Outcrop', 'LandUseGrade'] = 0
ParcelsData.loc[ParcelsData["land_use"] == 'Single Family Residential - Common', 'LandUseGrade'] = 4
ParcelsData.loc[ParcelsData["land_use"] == 'Multi Family', 'LandUseGrade'] = 2
#EXAMPLE TWO, STORY
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '1 STORY', 'StoryGrade'] = 4
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '1.5 STORY', 'StoryGrade'] = 2
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '2.0 STORY', 'StoryGrade'] = 3
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '2.5 STORY', 'StoryGrade'] = 2
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '3.0 STORY', 'StoryGrade'] = 2
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'RANCH W/BSMT', 'StoryGrade'] = 4
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'BI-LEVEL', 'StoryGrade'] = 1
ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'SPLIT LEVEL', 'StoryGrade'] = 1
#EXAMPLE THREE, ACRES
ParcelsData.loc[ParcelsData["Acres"] <= .1, 'AcresGrade'] = 1
ParcelsData.loc[ParcelsData["Acres"] <= .2, 'AcresGrade'] = 2
ParcelsData.loc[ParcelsData["Acres"] <= .3, 'AcresGrade'] = 3
ParcelsData.loc[ParcelsData["Acres"] <= .4, 'AcresGrade'] = 7
ParcelsData.loc[ParcelsData["Acres"] <= .5, 'AcresGrade'] = 8
ParcelsData.loc[ParcelsData["Acres"] > .5, 'AcresGrade'] = 9
CodePudding user response:
I'll do this for land_use, hope you get the idea.
See https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html for more details
land_use_map = {
'Flum/Swim Floodway (Restrected)': 0,
'Single Family Residential': 4,
'Wasteland, Slivers, Gullies, Rock Outcrop': 0,
'Single Family Residential - Common': 4,
'Multi Family': 2,
}
ParcelsData['land_use'] = ParcelsData['LandUseGrade'].map(land_use_map)