User 2064 | 6/25/2015, 9:12:09 AM
I would like to use Graphlab to analyse user behavioural patterns. If I had a dataset that said:
User | Where they started | Where they ended
I can imagine how to use general techniques to predict where a user may go next given where they are now. That bit seems quite well documented and pretty standard.
However I think I'm missing the correct terminology to search for how to construct this dataset efficiently, on the fly. One example might be Netflix style programme watching, where the "start" is what has just been watched and the "target" is what to watch next. Another real-world example might be google style search, where a user is typing letters into a search bar and suggestions are being made.
In the latter case you might imagine having incoming data like:
User | Partial search string 1 User | Partial search string 2 User | Partial search string 3 User | Clicked on a suggestion
more explicitly that might be:
Fred | 'f' Fred | 'fo' Fred | 'foo' Fred | clicked on 'foobar'
That could be translated to:
User | 'f' | 'foobar' User | 'fo' | 'foobar' User | 'foo' | 'foobar'
And then that seems like it might be a good standard-looking dataset to do training on. This could be extended to have entries for suggested "targets" along with whether the user actually clicked on them to build up a list of negative examples too, but I'm trying to keep it simple.
The question is: what is the canonical way of taking this stream of data and then tie it together with the eventual thing that a user clicked on? It feels like I'm missing some obvious technique. How would one approach this using Graphlab? Perhaps this is always something that would be done outside Graphlab and the resulting "clean" dataset sent into Graphlab?