Home > Blockchain >  intersect 2 series
intersect 2 series

Time:11-20

i have 2 series like this. i'll call it s1 and s2

1             Windows
2                 iOS
5                 AWS
5              Docker
5               Linux
             ...     
65112         Android
65112         Arduino
65112           Linux
65112    Raspberry Pi
65112         Windows
Name: PlatformWorkedWith, Length: 177060, dtype: object
Respondent
1                Android
1                    iOS
1             Kubernetes
1        Microsoft Azure
1                Windows
              ...       
64567            Android
65112            Arduino
65112              Linux
65112       Raspberry Pi
65112            Windows
Name: PlatformDesireNextYear, Length: 190223, dtype: object

how to combine 2 series, just keep the row have same index and value and save it into a new series? i am finding how to use combine but it seem not work. I want the result is series cause i want to use value_counts on it Eg: i want to combine Windowns appear in 2 series have the same index like 65112 in s1 and 65112 in s2 and [index,value] will be added to s3 (result), if Windowns not in index 65112 of s1 or s2, it will not be added to s3 s3 will be like this:

Respondent
1                Windows
...
65112            Linux
65112            Windows
65112       Raspberry Pi

Thanks

CodePudding user response:

Use GroupBy.size for counts in original Series, then filter same indices in both by Series.loc with Index.intersection and last count both with Series.add:

s11 = s1.groupby([s1.index, s1]).size()
s22 = s2.groupby([s2.index, s2]).size()

idx = s11.index.intersection(s22.index)

df = s11.loc[idx].add(s22.loc[idx]).rename_axis(('idx','vals')).reset_index(name='count')
print (df)
     idx          vals  count
0      1       Windows      2
1  65112       Arduino      2
2  65112         Linux      2
3  65112  Raspberry Pi      2
4  65112       Windows      2

EDIT: Still wait for comment, but if there is not only 1 values in s11 and s22 use:

s11 = s1.groupby([s1.index, s1]).size()
s22 = s2.groupby([s2.index, s2]).size()

idx = s11.index.intersection(s22.index)

s3 = pd.Series(idx.get_level_values(1), idx.get_level_values(0))

print (s3)
1             Windows
65112         Arduino
65112           Linux
65112    Raspberry Pi
65112         Windows
dtype: object

If values are always 1 in s11 and s22, it means are unique per indices use:

s11 = s1.to_frame().set_index('PlatformWorkedWith', append=True)
s22 = s2.to_frame().set_index('PlatformDesireNextYear', append=True)

idx = s11.index.intersection(s22.index)

s3 = pd.Series(idx.get_level_values(1), idx.get_level_values(0))

print (s3)
1             Windows
65112         Arduino
65112           Linux
65112    Raspberry Pi
65112         Windows
dtype: object
  • Related