Babies versus science

Babies versus science

The rewards and punishments of being two

Imagine if you woke up one day and found yourself unexpectedly surrounded by presents and then a few days later you woke up to find chocolate eggs all over your room.

Would you rejoice in your good fortune or question whether you are hallucinating?

Last week my little Max celebrated his second birthday and a few days later enjoyed his first visit from the Easter Bunny. This week of unforeseen bounty, must have been all the more surprising given our recent need to hand out some of his first genuine punishments. I can’t imagine how his little brain is trying to make sense of it all.

Now two, Max is increasingly curious and adventurous. Testing the limits of his physical ability and his parent’s patience brings unexpected rewards and punishments.

While Max is extremely coordinated, his language is still quite poor. So I doubt that any discussions during the preceding days about bunnies and parties by his big sister Susie made any sense to him (particularly in the absence of any bunnies or parties). From his perspective there was nothing special about these particular days, no anticipation or excitement in the lead-up.

The terrible twos

I have been thinking a lot recently about how best to manage rewards and punishments with little Max, particularly as he’s now in the terrible twos.

When my daughter was the same age she was comparatively lacking coordination but extremely good with language. So it was possible to explain and warn her about impending punishments and even “negotiate” (as much as you can negotiate with a two-year old) a desired reward for good behaviour.

With Max, we can’t explain or negotiate. He is being forced to learn through much more basic associations between his own actions and any positive or negative outcomes that follow.

At work I recently helped coordinate a graduate neuroscience course. Complementing the week of lectures, the group assignments required students to review one of the best-understood processes in behavioural neuroscience: Learning through rewards and punishments.

More than 20 years ago the somewhat counter-intuitive discovery was made that the brain does not code rewards or punishment in an absolute scale - with maximum response for the best outcomes and minimal or negative response for the worst outcomes. Rather outcomes are coded relative to their expected value as a “Reward Prediction Error”.

This probably explains why my brother was furious the day my parents gave him a new bike-lock and calculator for school. He would have been happy to receive either of these things on any other day, but not on his 16th birthday (we both still laugh about this as the worst birthday ever!)

While this valuing of reward as “relative to expected” is a little counter-intuitive, it is certainly a more efficient system. It means the brain does not constantly need to relearn that chocolate tastes good and it hurts if you hit your thumb with a hammer. But if one day you eat a chocolate that turns out to be filled with aniseed, it might be worth remembering to avoid that particular type of chocolate in the future.

It is only under certain circumstances, such as those underlying many addictions that this system of reward/punishment signalling appears to break down (even potentially my own self-confessed addiction to baby smiles).

So thinking back to little Max with his two-year old brain still learning the positive and negative outcomes to his behaviour, I can’t help but think how confusing life must be for him.

In some cases it is heart wrenching when his cheeky grin turns to tears as we carry him off to his room – an inevitable consequence of his need to push the boundaries and our need to define the boundaries. At this point I guess we are lucky that his indiscretions haven’t gone beyond throwing food or pulling his sister’s hair. I am not looking forward to the teenager years when the kids are capable of getting themselves into real trouble!

Not the reward you expected

In other cases the results are a little more amusing, such as a couple of weeks ago when I was amazed that he was willing to do his business in the toilet when I sat him on the seat. I was so impressed that in a desperate effort to “reinforce” this as a positive experience I ran around the house looking for some random toy that I could give him as a reward.

I gave him a big hug and repeatedly attempted to explain - in the most positive high pitched voice I could conjure - that he was a very good boy and was getting this toy for doing a poo in the toilet. In response he gave me a huge proud smile, grabbed the toy and ran straight over and threw it in the toilet.

“Mum will get excited every time something goes in the toilet” was not exactly the point I was trying to make!

While I still marvel how much can be communicated without language (as discussed in a previous post), I am beginning to appreciate how subtleties can get lost if we are limited to a simple code for good vs bad. I guess the beauty of the our brain’s use of Reward Prediction Error is that it allows us to gradually learn more and more intricate and complex associations between our actions and their consequences. At the same time we can continue to enjoy the unexpected.