I have collected data from a survey in order to perform a choice based conjoint analysis.
I have preprocessed and clean data with python in order to use them in R.
However, when I apply the function dfidx on the dataset I get the following error: the two indexes don't define unique observations.
I really do not understand why. Before creating the .csv file I checked if there were duplicates through the pandas function final_df.duplicated().sum()
and its out put was 0 meaning that there were no duplicates.
Can please some one help me to understand what I am doing wrong ?
Here is the code:
df <- read.csv('.../survey_results.csv')
df <- df[,-c(1)]
df$Platform <- as.factor(df$Platform)
df$Deposit <- as.factor(df$Deposit)
df$Fees <- as.factor(df$Fees)
df$Financial_Instrument <- as.factor(df$Financial_Instrument)
df$Leverage <- as.factor(df$Leverage)
df$Social_Trading <- as.factor(df$Social_Trading)
df.mlogit <- dfidx(df, idx = list(c("resp.id","ques"), "position"), shape='long')
Here is the link to the dataset that I am using https://github.com/AlbertoDeBenedittis/conjoint-survey-shiny/blob/main/survey_results.csv
Thank you in advance for you time
CodePudding user response:
The function dfidx()
is build for data frames "for which observations are defined by two (potentialy nested) indexes" (ref).
I don't know if/how it could work with three idx
s. Especially that, in your df, there aren't any duplicates ONLY when considering the combinations of the three columns you mention above (resp.id
, ques
and position
).
One solution to this problem is to "combine" the two columns resp.id
and ques
into one (called for example resp.id.ques
) with paste(...)
.
df$resp.id.ques <- paste(df$resp.id, df$ques, sep="_")
Then you can write the following line which should work just fine:
df.mlogit <- dfidx(df, idx = list("resp.id.ques", "position"))