i have 2 series like this. i'll call it s1 and s2
1 Windows
2 iOS
5 AWS
5 Docker
5 Linux
...
65112 Android
65112 Arduino
65112 Linux
65112 Raspberry Pi
65112 Windows
Name: PlatformWorkedWith, Length: 177060, dtype: object
Respondent
1 Android
1 iOS
1 Kubernetes
1 Microsoft Azure
1 Windows
...
64567 Android
65112 Arduino
65112 Linux
65112 Raspberry Pi
65112 Windows
Name: PlatformDesireNextYear, Length: 190223, dtype: object
how to combine 2 series, just keep the row have same index and value and save it into a new series? i am finding how to use combine but it seem not work. I want the result is series cause i want to use value_counts on it
Eg: i want to combine Windowns
appear in 2 series have the same index like 65112 in s1 and 65112 in s2 and [index,value] will be added to s3 (result), if Windowns not in index 65112 of s1 or s2, it will not be added to s3
s3 will be like this:
Respondent
1 Windows
...
65112 Linux
65112 Windows
65112 Raspberry Pi
Thanks
CodePudding user response:
Use GroupBy.size
for counts in original Series
, then filter same indices in both by Series.loc
with Index.intersection
and last count both with Series.add
:
s11 = s1.groupby([s1.index, s1]).size()
s22 = s2.groupby([s2.index, s2]).size()
idx = s11.index.intersection(s22.index)
df = s11.loc[idx].add(s22.loc[idx]).rename_axis(('idx','vals')).reset_index(name='count')
print (df)
idx vals count
0 1 Windows 2
1 65112 Arduino 2
2 65112 Linux 2
3 65112 Raspberry Pi 2
4 65112 Windows 2
EDIT: Still wait for comment, but if there is not only 1
values in s11
and s22
use:
s11 = s1.groupby([s1.index, s1]).size()
s22 = s2.groupby([s2.index, s2]).size()
idx = s11.index.intersection(s22.index)
s3 = pd.Series(idx.get_level_values(1), idx.get_level_values(0))
print (s3)
1 Windows
65112 Arduino
65112 Linux
65112 Raspberry Pi
65112 Windows
dtype: object
If values are always 1
in s11
and s22
, it means are unique per indices use:
s11 = s1.to_frame().set_index('PlatformWorkedWith', append=True)
s22 = s2.to_frame().set_index('PlatformDesireNextYear', append=True)
idx = s11.index.intersection(s22.index)
s3 = pd.Series(idx.get_level_values(1), idx.get_level_values(0))
print (s3)
1 Windows
65112 Arduino
65112 Linux
65112 Raspberry Pi
65112 Windows
dtype: object