I would like to extract strings from a column containing set in a pandas dataframe. The column looks like the below:
0 {s}
1 {B}
2 {m}
3 {H}
4 {b}
...
295 {G}
296 {N}
297 {s}
298 {v}
299 {p}
Name: letters, Length: 300, dtype: object
when I use the str function to extract the texts and store in another column, the output looks like this:
0 0 {s}\n1 {B}\n2 {m}\n3 {H}...
1 0 {s}\n1 {B}\n2 {m}\n3 {H}...
2 0 {s}\n1 {B}\n2 {m}\n3 {H}...
3 0 {s}\n1 {B}\n2 {m}\n3 {H}...
4 0 {s}\n1 {B}\n2 {m}\n3 {H}...
...
295 0 {s}\n1 {B}\n2 {m}\n3 {H}...
296 0 {s}\n1 {B}\n2 {m}\n3 {H}...
297 0 {s}\n1 {B}\n2 {m}\n3 {H}...
298 0 {s}\n1 {B}\n2 {m}\n3 {H}...
299 0 {s}\n1 {B}\n2 {m}\n3 {H}...
Name: str_val, Length: 300, dtype: object
if anyone can kindly help me explain why it gets converted like this?
letters is the column name of this set. I would like to create another column 'comm' which should look like the below:
0 s
1 B
2 m
3 H
4 b
and the datatype should be string. Any help is much appreciated.
CodePudding user response:
Use a list comprehension (faster than apply
) with iter
and next
with None
(or anything you want) as default value in case you have empty sets:
df['letter'] = [next(iter(s), None) for s in df['set']]
Example:
set letter
0 {s} s
1 {B} B
2 {m} m
3 {H} H
4 {b} b
5 {} None
Used input:
df = pd.DataFrame({'set': [{'s'}, {'B'}, {'m'}, {'H'}, {'b'}, {}]})
CodePudding user response:
df["comm"] = df["letters"].apply(lambda x: x.pop())
Explanation:
apply
iterates through each row in the letters
column, running the lambda function specified, and returning a series comprised of each value the lambda function returns. The lambda function in this case pops an element out of the set found in each row. In this case, since each row is a set of one element, .pop()
will work for your use case.
CodePudding user response:
It seems you are converting the whole dataframe into a string for each row. You can get the whole column using:
str_val["LettersColumn"] = letters["LettersColumn"]
You should change "LettersColumn" to the names of your columns of course.