I have the following data:
0 Ground out of 2
1 1 out of 3
2 1 out of 3
Name: Floor, dtype: object
I want to modify this data so that I can create two columns named first floor and max floor.
Looking at the first item as an example:
0 Ground out of 2
the first floor would be 0 and max floor would be 2 etc...
This is the code I have written to extract the first floor items:
first_floor = []
lower_floors = ['Ground','Basement]
for data in df.Floor:
for char in lower_floors:
if char in data:
floor_location.append('0')
else:
floor_location.append(data[:2])
When I do this, I get the following output:
['0', 'Gr', '1 ', '1 ']
I am expecting
['0', '1 ', '1 ']
Can someone explain where I am going wrong?
Thanks in advance.
CodePudding user response:
You loop is written in a wrong order.
But anyway, don't use a loop, rather use vectorial string extraction and fillna
:
df['Floor'].str.extract('^(\d )', expand=False).fillna(0).astype(int)
Or for more flexibility (Ground -> 0 ; Basement -> -1…):
(df['Floor'].str.extract('^(\w )', expand=False)
.replace({'Ground': 0, 'Basement': -1})
.astype(int)
)
output:
0 0
1 1
2 1
Name: Floor, dtype: int64
As list:
df['Floor'].str.extract('^(\d )', expand=False).fillna(0).astype(int).tolist()
output : [0, 1, 1]
CodePudding user response:
First of all the indent of the else case is wrong. It should be:
first_floor = []
lower_floors = ['Ground','Basement']
for data in df.Floor:
for char in lower_floors:
if char in data:
floor_location.append('0')
else:
floor_location.append(data[:2])
And second, as you are looping through the Floor column, data
will be just a cell, not a row. So data[:2]
will cut the cell to 2 characters. This is why you see Gr
.