There are many different kinds of data problems in this world, some are tackled very easily and some require very complicated solutions. However, some problems might be very difficult to solve at all, independently of the level of complexity of the solutions attempted. Today I want to talk about a key aspect of prediction problems: the degree of determinism of the problem at hand. I’ll start with some examples that align more with reinforcement learning, mainly games, to move into examples that are more related with traditional regression/classification problems.
Was beating the world experts in the game of Go, a difficult problem? Yes it was, and it has been one of the crowning achievements of deep learning so far. It was a very deterministic problem, a problem with known rules and known agents, a problem that requires a very complicated solution – yes – but coming from a very simple framework. You know what you have to do, you know the consequences of every action and you can practice as much as you want. Games like this one are as deterministic as can be – solving them might not be trivial – but the difficulty comes from complexity in learning the potential outcomes of the game, not from now knowing how the game is played or the consequences that actions have.
Now, imagine that a game of Go contained a random component, where there was some probability that moving a piece to one place would cause a completely different action to happen. Can we learn how to play a game like this? If the game is now less deterministic – there’s noise in it – then being good at it becomes way harder and the more random it becomes the less likely that a player will be able to become skilled a it. If the game was fully random – every action led to an unknown consequence – then it would simply be impossible to learn, regardless of how much we played.
Some problems are like a deterministic game, they are more clearly determined in nature and their solution is more fixed (whether that solution is simple or complex) while others are much more random and the degree to which we can learn the real “game” vs the randomness around it is not very well understood. A problem like classifying a picture of a cat or dog is much more deterministic – all we need is enough examples to do a good job – while a problem like sports betting or predicting the weather is much more difficult.
The perceived non-deterministic nature of problems usually relates to the amount of potential variables that affect their outcomes and our ability to create or observe examples for them. The variables that affect the outcome of something like a baseball game may go from someone yelling at the game to the weather, to what each player ate for breakfast that day while the outcome of whether the picture of something is a dog or a cat, will just depend on whether it fits a known canon of what a dog or cat is expected to be according to what we all know dogs and cats are. I know I can learn that canon from looking at pictures of dogs and cats, but I don’t know what information I need to gather to successfully predict baseball games, in one case my world is constrained, in the other it’s way less determined. Both processes are predictable in the end – if we possessed all information – but in one case acquiring enough information to make an accurate prediction is way harder than in the other.
In terms of making predictions, easier problems are usually defined by very strong relationships between the variable we want to predict and the observed variables while harder problems are associated with complicated relationships that may be hard to distinguish from the relationships expected from random chance. If we don’t know what to measure, we might measure many things that have nothing to do with our problem and end up with confounding results. That’s why the solution to not knowing what to measure, is never to measure everything.
If you have a non-deterministic problem, especially if the amount of examples you have is heavily constrained, then you have to be very intelligent about the observable variables you will be measuring and how you will account for spuriously finding variables that predict your desired outcome, just because you’ve searched for long enough.
Imagine if you were tasked with predicting the outcome of a machine using the input from as many other similar machines as you wanted. All of these machines are true random number generators (but you don’t know that). How long would it take for you to give up? Would you ever?