I have a column
df['COL_1']
and a list of numbers
num_range = list(range(200,281, 5))
The columns contains either words such as UNREADABLE
NOT_PASSIVE
or some values present in the list above, so 200
205
210
etc or nothing.
I am trying to get a sum of how many rows in that column contains a number in the range given.
What I have tried:
df['COL_1'].value_counts(num_range)
I am not sure what else to try, the various trials I have done similar to the above have all failed. I'm new to python, any guidance is highly appreciated.
Python 2.7 and pandas 0.24.2
EDIT:
I was getting errors, as other users have mentioned, my data was not numeric. Fixed this with .astype
or alternatively, redifining target_range as so:
target_range = map(str, range(200, 281, 5))
CodePudding user response:
If you're after the whole sum, and are not interested in the breakout of individual counts,
target_range = range(200, 281, 5)
df["COL_1"].isin(target_range).sum()
Notice that you do not need to cast the range
object to a list
.
If you want the breakout of value counts, see @Corralien's answer.
Details: pandas.DataFrame.isin()
is a function which returns a boolean mask.
>>> import pandas as pd
>>> # Data provided by Corralien
>>> df = pd.DataFrame({'COL_1': ['UNREADABLE', 200, 'NOT_PASSIVE', 205, 210, 200, '', 210, 180, 170, '']})
>>> target_range = range(200, 281, 5)
>>> df.isin(target_range)
COL_1
0 False
1 True
2 False
3 True
4 True
5 True
6 False
7 True
8 False
9 False
10 False
Notice I'm using df.isin()
instead of df["COL_1"].isin()
. If you have multiple columns in your DataFrame that you want to perform this operation on, you can pass a list
of column names. If you want to perform this operation on your entire DataFrame, you can simply use df.isin()
.
The .isin()
method returns a boolean mask. Since bool
is a subtype of int
, you can simply call sum()
on the resulting DataFrame to sum up all the 1s and 0s to get your final tally of all the rows which match your criterion.
CodePudding user response:
IIUC, you can try:
df = pd.DataFrame({'COL_1': ['UNREADABLE', 200, 'NOT_PASSIVE', 205, 210,
200, '', 210, 180, 170, '']})
out = df.loc[df['COL_1'].apply(pd.to_numeric, errors='coerce')
.isin(num_range), 'COL_1'] \
.value_counts()
>>> out
200 2
210 2
205 1
Name: COL_1, dtype: int64
>>> out.sum()
5
CodePudding user response:
You have a column in a DataFrame as a list of items and also have another list of values. and you want to count how many items from dataframe column exist in the list. right?
so use this:
count = 0
for i in df['COL_1']:
if i in num_range:
count =1
in each iteration on your column, if the value exist in the list, count variable added by one.