so i have this kind of dataframe and i want to select only these websites that have TLSVersion1.2 and 1.3 ONLY. There is already a similar thing Pandas select rows that contains both values (inclusive)
websiteName TLSVersion strength kex_info \
1 website1.lt TLSv1.2 A dh 2048
2 website1.lt TLSv1.3 A rsa 2048
3 website2.lt TLSv1.0 C rsa 2048
4 website2.lt TLSv1.1 A rsa 2048
5 website2.lt TLSv1.2 A rsa 2048
6 website2.lt TLSv1.3 A ecdh_x25519
7 website3.lt TLSv1.0 A dh 4096
8 website3.lt TLSv1.1 A dh 4096
9 website3.lt TLSv1.2 A dh 4096
10 website3.lt TLSv1.3 A ecdh_x25519
11 website4.lt TLSv1.0 A dh 4096
12 website4.lt TLSv1.1 A dh 4096
13 website4.lt TLSv1.2 A dh 4096
14 website4.lt TLSv1.3 A ecdh_x25519
Desire Output 1: website1.lt : True
CodePudding user response:
You can do this using isin():
import pandas as pd
df = pd.DataFrame({
'websiteName' : ['website1.lt']*2 ['website2.lt']*4 ['website3.lt']*4 ['website4.lt']*4,
'TLSVersion' : ['TLSv1.2', 'TLSv1.3', 'TLSv1.0', 'TLSv1.1', 'TLSv1.2', 'TLSv1.3', 'TLSv1.0', 'TLSv1.1', 'TLSv1.2', 'TLSv1.3', 'TLSv1.0', 'TLSv1.1', 'TLSv1.2', 'TLSv1.3'],
'strength' : ['A', 'A', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
'kex_info' : ['dh 2048', 'rsa 2048', 'rsa 2048', 'rsa 2048', 'rsa 2048', 'ecdh_x25519', 'dh 4096', 'dh 4096', 'dh 4096', 'ecdh_x25519', 'dh 4096', 'dh 4096', 'dh 4096', 'ecdh_x25519']
})
print(f"Sample input dataframe:\n{df}")
df2 = df[df.TLSVersion.isin(['TLSv1.2', 'TLSv1.3'])]
print(f"\nOutput with rows containing 1.2 and 1.3 only:\n{df2}")
Output
Sample input dataframe:
websiteName TLSVersion strength kex_info
0 website1.lt TLSv1.2 A dh 2048
1 website1.lt TLSv1.3 A rsa 2048
2 website2.lt TLSv1.0 C rsa 2048
3 website2.lt TLSv1.1 A rsa 2048
4 website2.lt TLSv1.2 A rsa 2048
5 website2.lt TLSv1.3 A ecdh_x25519
6 website3.lt TLSv1.0 A dh 4096
7 website3.lt TLSv1.1 A dh 4096
8 website3.lt TLSv1.2 A dh 4096
9 website3.lt TLSv1.3 A ecdh_x25519
10 website4.lt TLSv1.0 A dh 4096
11 website4.lt TLSv1.1 A dh 4096
12 website4.lt TLSv1.2 A dh 4096
13 website4.lt TLSv1.3 A ecdh_x25519
Output with rows containing 1.2 and 1.3 only:
websiteName TLSVersion strength kex_info
0 website1.lt TLSv1.2 A dh 2048
1 website1.lt TLSv1.3 A rsa 2048
4 website2.lt TLSv1.2 A rsa 2048
5 website2.lt TLSv1.3 A ecdh_x25519
8 website3.lt TLSv1.2 A dh 4096
9 website3.lt TLSv1.3 A ecdh_x25519
12 website4.lt TLSv1.2 A dh 4096
13 website4.lt TLSv1.3 A ecdh_x25519
CodePudding user response:
I managed to solve this by dividing it into parts: First, I merge all tlscolumns into one row :
allVersionInCollumn = noDuplicates.groupby(['websiteName'])['TLSVersion'].apply(', '.join).reset_index()
allVersionInCollumn result:
websiteName TLSVersion
0 website1 TLSv1.0, TLSv1.1, TLSv1.2
1 website2 TLSv1.2
2 website3 TLSv1.2, TLSv1.3
3 website4 TLSv1.0, TLSv1.1, TLSv1.2, TLSv1.3
Then I used these commands to split the columns:
splitedTLS = allVersionInCollumn["TLSVersion"].str.split(",", n=0, expand=True)
allVersionSplited = allVersionInCollumn.join(splitedTLS)
allVersionSplited = allVersionSplited.drop(columns="TLSVersion")
allVersionSplited.head(10)
websiteName 0 1 2 3
0 website1 TLSv1.0 TLSv1.1 TLSv1.2 None
1 website2 TLSv1.2 None None None
2 website3 TLSv1.2 TLSv1.3 None None
Then I used following command to sort Python pandas sort multiple columns based on their values and other rows
allVersionSplited= (allVersionSplited
.replace({'None': pd.NA}) # assuming 'None' strings
.melt(id_vars='websiteName')
.assign(TLS=lambda d: d['value'])
.pivot_table(index='websiteName', columns='TLS', values='value', aggfunc='first')
.reset_index()
)
Droping previous columns
allVersionSplited.drop(allVersionSplited.iloc[:, 1:5],inplace=True,axis=1)
allVersionSplited.head()
TLS websiteName TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
0 website1 TLSv1.0 NaN NaN NaN
1 website2 NaN NaN TLSv1.2 NaN
2 website3 NaN NaN TLSv1.2 TLSv1.3
Then removed those rows that didn't have tlv1.0 and tls1.0 support
refined = allVersionSplited[allVersionSplited["TLSv1.0" ].isnull()]
refined = refined[refined["TLSv1.1"].isnull()]
refined.head()
TLS websiteName TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
1 website2 NaN NaN TLSv1.2 NaN
2 website3 NaN NaN TLSv1.2 TLSv1.3
Lastly refined.shape[0]
gives :
2