Projects - MythBayes

Introduction

Bayesian statistical analysis is the current hot topic for spam-filtering, where it seems to work exceptionally well. The uses for bayesian filtering are not limited to spam though - it has no pre-determined rules, it just learns the difference between 2 different groups of messages.

This project is an attempt to use bayesian filtering techniques to filter the TV program listings in MythTV and make recommendations about what programs you might be interested in watching.

For example, if you like watching a lot of "Buffy the Vampire Slayer", it might notice that the words "demon", "witch" and "magic" show up a lot in the program descriptions. It might then recommend that you might like to watch "Charmed", since the description probably has those words in it. As time goes on it will (hopefully) get better at recommending since it will learn more about what you like to watch.

This project is a work in progress - comments and suggestions are gratefully received. But don't expect it to work perfectly.

Data Sources

The algorithms need to get their data from somewhere to know what is "good" and "bad". At the moment, all programs start by being considered "bad" (they are learnt from the program listings). When a program shows up in the recorded or oldrecorded databases then it is unlearnt from the "bad" database and learnt as "good". So the assumption is made that anything you record is something you're interested in (this may be a bit invalid, but we might be able to fix that later).

Usage

The current version runs against MythTV 0.18. Generally you should run MythBayes just after mythfilldatabase. This is for two reasons:

  1. To learn the new programs that have just been imported.
  2. mythfilldatabase just wiped out all the probabilities from the program table and we want to put them back. :)

Normally you would execute it like:

mythbayes --cleandb --learn --rateall

This tells it to nuke any old data from it's databases (--cleandb), learn any programs it hasn't seen before in any of the databases (--learn), and calculate probabilities for all the programs in the program table (--rateall).

MythBayes has several commandline options whcih may be useful to you:

--learn Learn any programs in any of the databases that haven't already been learnt.
--rateall Calculate the probabilities for all the programs in the program table.
--ratenew The same as --rateall except only programs without probabilities are calculated. Remember that mythfilldatabase wipes out all probabilities anyway, and if it's learnt some more stuff then this won't recalculate the probabilities on the records that already have them.
--reinit Nuke all the bayesian data from the database and start from scratch.
--reprob Recalculate all the token probabilities. This happens during --learn anyway.
--cleandb Nuke any old data from the database not needed anymore.

At the moment there is nothing added to the Myth frontend, all MythBayes does is add a field to the program database with a probability in it. The higher the probability, the more likely you're going to like the program (hopefully).

There are a couple of really quick and dirty shell scripts provided to do something useful with the data:

genhtml.sh Issues an few SQL queries and outputs some HTML with recommended programs in.
progdetail.cgi A CGI script that is linked from the HTML that sqlquery outputs and shows the program description, hilighting the words in the description to show the score.

Extra Toys

Also provided in this package:

The Future

Some ideas for the future of MythBayes (let me know what you think :)...

References

Contact Me
Site last updated: 26th June, 2013