Tricks to exceed column limitations in SQL Database-CodePudding

Hello swarm intelligence!

I have the following use case: For every movie that is requested by a user, I create a number of tags for that specific movie, derived from several sources (actors, plot etc.. ).

I will use this data for associaton mining.

The problem: If I use the movie for rows and the tags for columns, the tags will easily exceed the technical limitations of 3000 columns ( there is even more actors, and then plot keywords etc)

Is there any way, I can organize this data to then use it for (quick) association mining?

Thanks a lot

CodePudding user response：

Don't put tags in columns. Instead create a separate table, named something like movie_tags with two columns, movie_id and tag. Put each tag in a separate row of that table.

This is known as "normalizing" your data. Here's a nice walkthrough with an example very similar to yours.

Edit: Let's say you have a catalog of movies about the Italian Mafia in New York City in the 20th century. Let's say the movies are

1  Godfather
2  Goodfellas
3  Godfather II

Then your movie_tags table might contain these rows.

1 Gangsters
2 Gangsters
3 Gangsters
1 Francis Ford Coppola
3 Francis Ford Coppola
2 Martin Scorsese

Pro tip If you find yourself thinking about putting lots of data items with the same meaning in their own columns, you probably need to normalize the data and add appropriate tables.