I've searched exhaustively for a fix with what is wrong with my Pandas upload but no luck. I would greatly appreciate some help.
I'm trying to run a machine learning algorithm (apriori) in Python and I have a CSV file to upload.
What my CSV file looks like in Notepad
Here is my Python code and resulting error message:
Photo of Python code and error message
I've tried pasting the code using CTRL K but it's not working.
CodePudding user response:
You'll want to use delim_whitespace=True
parameter which will end up giving you one row per transaction, which you can then split and apply set
and feed into apriori.
Given a sample text file containing:
test
also,test
You can run the following:
import pandas as pd
from apyori import apriori
df = pd.read_csv('Ready Apriori DWK.csv', header=None, delim_whitespace=True, names=['data'])
results = list(apriori(df['data'].str.split().apply(set)))
print(results)
Output
[RelationRecord(items=frozenset({'also,test'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'also,test'}), confidence=0.5, lift=1.0)]),
RelationRecord(items=frozenset({'test'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'test'}), confidence=0.5, lift=1.0)])]
CodePudding user response:
This approach uses more-than-enough columns, reads the CSV, then gets rid of the unused columns via isna()
and loc
.
df = pd.read_csv('your.csv', header=None, names=range(20)) \
.loc[:,lambda x: ~x.isna().all()]
print(df)
Result
0 1
0 W010638C NaN
1 07-3000-300 7-3000-300
2 W010665 W216962
3 W015015 NaN
4 W015183A NaN
5 W001013J NaN
6 W000102C NaN
7 07-0017N 7-0017N
8 WC000286 NaN
9 W017221 NaN
10 W000120C NaN
11 W017814 NaN
etc ...
Note your data has more than 2 columns but my test data subset only had a max of 2 columns.