Brain Science – Good to Press the Bell

Behaviour X is something that makes me feel weird if I do not do it.

– Item 4, The Self-Report Habit Index

Goal devaluation, for example, in the case of food, by poisoning or satiety associated with specific flavours, should be impervious for habit-controlled behaviour— in other words, it proceeds regardless.

– Habits

Skinner, a psychologist, taught a cat to play the piano. He did this through rewarding the cat for desired behaviour. This did not mean rewarding the cat each and every time it executed the trainer’s intentions. If desired behaviour was entirely dependent on rewards, it would be no different from getting a drink from a vending machine.

There is a way to use rewards effectively such that the desired behaviour remains even long after the rewards stop coming. This is to time rewards on what is known as a variable ratio schedule (Cherry, 2020).

Let us say A begins with the objective of getting B to perform some action, B would not, otherwise of B’s own volition, do. Essentially, A has to be very predictable with rewards in the beginning to train an association between an action (what A wants B to perform) and an outcome (what A gives B for performing). Once B has become familiar with performing according to A’s wishes, A begins to taper off the rewards until no rewards are forthcoming.

If B were a vending machine, the relationship between A and B would effectively cease when the rewards cease. How does A ensure B continues performing?

In the beginning, every instance of Behaviour X would be followed with a reward. After Behaviour X has been entrenched, A stops the rewards for a period. B would continue Behaviour X, seeking the reward. If enough time passes with no reward, B would learn that Behaviour X is no longer rewarding and “contingency degradation, whereby the connection between an action and an outcome is degraded” (Robins & Costa, 2017) would occur. Once this happens, B would no longer continue as A would like. Just before contingency degradation can occur, A rewards B again. Behaviour X is now reinforced and B would think that the input-output or response-outcome causality link holds and continue working for the reward. The rewards will not follow in a predictable fashion. B would simply begin to think that so long as performance continues, a reward would appear somewhere down the road. Indeed, it does, sometimes after a few instances of Behaviour X and at other times only after an extended period. This is what variable in variable ratio schedule means. The schedule on which the rewards are given has to be varied.

This method of conditioning has been proven to be very effective in creating lasting behavioral changes. In layman’s terms, it is to habituate someone to do something. This method works especially well on the conscientious and doggedly determined who have formed the belief that success comes after a lot of hard work.

When we think of habits, we think of good habits and bad habits and there could be an assumption that we have control over our habits. Once a habit has been formed, it would be very uncomfortable to change. This is why Walter in Breaking Bad tells his son Walter Junior not to use both feet on his car pedals.

A habit is a representation of a stimulus-response link which does not “refer to goals” and is “in a sense directly elicited by the environmental states or stimuli and contexts” (Robins & Costa, 2017). Habits could have nothing to do with goals. They could proceed automatically upon encountering cues despite conscious intentions. Indeed, Wood and others (2021) say, “Habit automaticity also persists despite people’s intentions to act in other ways”.

We know instinctively and through hearing from parents that “habit memories form gradually with repetition” (Wood et al., 2021). So, attention is paid to what we choose to do. We try not to repeat actions which could be rewarding in the moment but would cause loss over time. To this extent, control over habits is assumed.

However, it should be highlighted that habits can be inculcated from the outside. Habits do not refer merely to things like brushing teeth before bed, sleeping time and reading or exercise. They refer to any action that is done automatically on cue including performing in accordance with someone else’s expectations whatever those may be. If A manages to habituate B, A might feel a power rush. Habituating someone else can be intrinsically rewarding because it reinforces a sense of self.

An exceptionally lethal combination, or the perfect trainer-trainee pair would be when the trainer likes a sense of challenge and chooses a trainee who appears to cherish autonomy and when the trainee is the kind of person who believes you only fail when you stop trying. The challenge for the trainer can be framed this way: How little can I give for the trainee to persist in what I want the trainee to do? Someone who manages to get a trainee to perform spectacular stunts for next to nothing has perfected the art and science of habituating.

Two other ways of habituating is as follows.

Let us say A wants B to take something B would not ordinarily agree to taking. A has to pair what B wants with what B does not want. Each time B gets what it does want, A presents it together with what B does not want. For the sake of getting what B does want, B becomes habituated to take also what B does not want.

Another way of making B take what it does not want is through the principle of “spaced practice” (Robins & Costa, 2017). Spaced practice can be contrasted with massed practice which is to introduce a stimulus frequently in a short span of time. Spaced practice is to introduce the stimulus slowly but surely over a long period of time at regular intervals. If B does not like what A wants B to endure, there might be an adverse reaction in the first instance and in the short term. A skilful trainer would not engage in an obvious confrontation. Instead, a skilful A would appear to remove the stimulus which B does not want in the short term. However, A would reintroduce this unwanted stimulus yet again after enough time has passed. If A is very patient, having conquered all other mountains, A would repeat this cycle of presenting the stimulus, withdraw it upon encountering resistance and then reintroduce the stimulus again after some time, for a very long time. Eventually, if B needs to deal with A, B’s resistance would decline and B might even come to feel weird if the stimulus is not presented.

While initially habits are reinforced because of the reward at the end of the performance, they become self-reinforcing or rewarding in themselves. This is how self-destructive habits persist. This is because of the way the brain is wired.

Neuroscientists have shown “the existence of different controllers of action (or at least different neural circuits” for “stimulus-response and goal directed learning” (Robins & Costa, 2017). Wood and others (2021) explain that, after someone has become habituated, “activation peaked in the lateral putamen, an area associated with habitual responding, but not the caudate, an area associated with declarative memory involved in conscious goal pursuit”. Declarative memory can be distinguished from procedural memory and the former has to do with recalling and selecting from a number of available options for goal achievement. The latter has to do with executing a series of steps through “top-down control” thus “freeing up cortical processing time for novel and important situation” (Robbins & Costa, 2017).

This means that for seemingly bizarre reasons one could see himself performing a certain set of actions automatically (because his brain has been trained that way) upon the presence of some environmental stimulus. For instance, A could have experienced exceptional success with a tough nut to crack in the past. A might miss the feeling of accomplishment and when A encounters another seemingly tough nut, A performs actions which were successful in the past automatically despite no conscious intention to control.

The goal seeking circuit is distinct from the habit performing circuit. If a habit has not been trained through an extensive period of time and if the reward is not particularly salient, the goal-seeking circuit could take over control. To illustrate this, Wood and others (2021) cite Kruglanski and Szumowska’s (2020) example of a professor who “tries to enter his office building through the doorway he has used for years, despite knowing that the entry was recently closed for renovations” and on “realizing his mistake, he alters course toward a still-open doorway”. However, some habits lead to rooms with doors that lock firmly shut after entry.

Therefore, it is important to recognise the process of habituation. If a pattern is spotted, one has to exercise cognitive control while it is still possible to do so. Some ways of doing this are as follow.

First the reward could be devalued. Alternatively, the link between the response and the outcome can be devalued. Both of these according to Robbins and Costa (2017) require “cognitive processing”. This means repeatedly bringing to mind all the negative qualities of what was once considered a reward. This will take effort and one must stay the course until the Gordian knot comes loose.

Wood and others (2021) with unmatched public spiritedness, share two other ways. The first of these is to implement “changes in cues” which “should impede response activation so that the practiced response is not brought quickly to mind”. This means if one knows that a particular environment is not conducive to autonomy, changing the environment will stop activating the learned response sequence. The other way is to reduce “stressors, fatigue, and time pressures” because these “impede goal processes” but “leave habit processes relatively intact”.

Press the bell if you don’t want to hear it.

The Brain Dojo

Leave a Reply

Your email address will not be published. Required fields are marked *