intersect 2 series-CodePudding

i have 2 series like this. i'll call it s1 and s2

1             Windows
2                 iOS
5                 AWS
5              Docker
5               Linux
             ...     
65112         Android
65112         Arduino
65112           Linux
65112    Raspberry Pi
65112         Windows
Name: PlatformWorkedWith, Length: 177060, dtype: object
Respondent
1                Android
1                    iOS
1             Kubernetes
1        Microsoft Azure
1                Windows
              ...       
64567            Android
65112            Arduino
65112              Linux
65112       Raspberry Pi
65112            Windows
Name: PlatformDesireNextYear, Length: 190223, dtype: object

how to combine 2 series, just keep the row have same index and value and save it into a new series? i am finding how to use combine but it seem not work. I want the result is series cause i want to use value_counts on it Eg: i want to combine Windowns appear in 2 series have the same index like 65112 in s1 and 65112 in s2 and [index,value] will be added to s3 (result), if Windowns not in index 65112 of s1 or s2, it will not be added to s3 s3 will be like this:

Respondent
1                Windows
...
65112            Linux
65112            Windows
65112       Raspberry Pi

Thanks

CodePudding user response：

Use GroupBy.size for counts in original Series, then filter same indices in both by Series.loc with Index.intersection and last count both with Series.add:

s11 = s1.groupby([s1.index, s1]).size()
s22 = s2.groupby([s2.index, s2]).size()

idx = s11.index.intersection(s22.index)

df = s11.loc[idx].add(s22.loc[idx]).rename_axis(('idx','vals')).reset_index(name='count')
print (df)
     idx          vals  count
0      1       Windows      2
1  65112       Arduino      2
2  65112         Linux      2
3  65112  Raspberry Pi      2
4  65112       Windows      2

EDIT: Still wait for comment, but if there is not only 1 values in s11 and s22 use:

s11 = s1.groupby([s1.index, s1]).size()
s22 = s2.groupby([s2.index, s2]).size()

idx = s11.index.intersection(s22.index)

s3 = pd.Series(idx.get_level_values(1), idx.get_level_values(0))

print (s3)
1             Windows
65112         Arduino
65112           Linux
65112    Raspberry Pi
65112         Windows
dtype: object

If values are always 1 in s11 and s22, it means are unique per indices use:

s11 = s1.to_frame().set_index('PlatformWorkedWith', append=True)
s22 = s2.to_frame().set_index('PlatformDesireNextYear', append=True)

idx = s11.index.intersection(s22.index)

s3 = pd.Series(idx.get_level_values(1), idx.get_level_values(0))

print (s3)
1             Windows
65112         Arduino
65112           Linux
65112    Raspberry Pi
65112         Windows
dtype: object