Meeting #4 - Stephen Grossberg

Monday, 4/18/2022

We are extremely honored to announce that we hosted Stephen Grossberg, a founder of the fields of theoretical psychology and cognitive science, computational neuroscience, and biologically-inspired technology as our speaker for Journal Club on 4/18/22.

Brief Background
Recording
Slides
Discussion Notes
Resources and Further Reading

Brief Background

After becoming the first joint undergraduate major in mathematics and psychology at Dartmouth, and mathematics as a PhD student at Stanford and Rockefeller, he became an assistant professor and taught applied mathematics at MIT while he continiued to discover and develop neural models of how brains make minds. Unlike Grossberg, most did not believe in this mysterious domain of “computational neuroscience” in the 1970s.

Grossberg next moved to Boston University, whose President and Provost believed in his vision and awarded him an endowed Chair in cognitive and neural systems. There, he established the Department of Cognitive and Neural Systems, which he and many gifted colleagues developed into the world’s leading graduate department that theoretically explains how brains make minds, carries out experiments to test these explanations and predictions, and applies these insights to applications in engineering, technology, and AI. He is the foundational architect of the field and most of his work relates to what we do at Interactive Intelligence. In particular, his models help to explain essentially all the fundamental brain processes that make us human and provide a blueprint for an AI with human-level intelligence.

Please review his biography here.

Recording

View the full transcript here.

Access the video in a separate tab.

Check out some clips, too.

Slides

Access slides in a new tab here.

Discussion Notes

How would you explain your work to someone that doesn’t have any prior knowledge?

Work begins in 1957 - Introductory Pyschology course in Dartmouth College.
Paradoxes in pyschological data forces the introduction of nonlinear differential equations to explain how brains make our minds.
- Equations for neuronal activation, short-term memory, learning in memory, etc.
- How does brain activity control psychological behavior?
Conscious Mind, Resonant Brain - interactions within several regions, with emergent properties: language, emotion, etc.
- Models are central to explaining emergent properties.
We start with many psychological experiments: the art of modeling is to think about the data long enough such that you can imagine it as being created by an indviidual mind acting autonomously. No algorithm for it. Speculative.
The method allows you to discover underlying design principles that can be converted into mathematical models that embody the principles and use the models to further explain pyschological data.
Many of the models looked like neural networks. Understand the limits of modeling and how better models can be built.
The modeling cycle leads towards increasingly accurate models with greater expalanatory power. Grossberg has gone through this cycle many times. The Grossberg Modeling Cycle.
You must develop experimental intuition - to understand the mind, the brain, or both - fall in love with data you desperately want to understand. You need to put in the work needed to figure it out.
How do we learn or memorize lists of events?
- Psychological data about learning the alphabet.
- Led to equations for short-term and long-term memory.
- Excited by data - a paradox! There are more errors in the middle of the list than the beginning or the end. The beginning and the end are easier to learn than the middle. One would expect that you accumulate more errors towards the end of the list. Instead, it gets harder - then easier. If you increase the amount of rest, the entire distribution changes. Non-occurrence of a future item (i.e. longer rest) can influence the learning of lists. Events can influence each other backwards in time.
Backwards learning. Recurrent interactions between signals: we need a network with nodes representing A, B, C; with associations between each of the nodes, and so on - forced into neural networks. The rate at which things happen influence the learning; differential equations are used to represent time scales.

Could you say more about your new book and its contents?

Was written for the gneeral public. It’s challenging but self-contained, as nontechnical as possible, written in a conversational tone, written wherever possible as a series of stores.
Provides principled and unifying explanations for the data in hundreds of psychological and neurobiological experiments.
If you work hard to understand the underlying principles in a scientific process, it will give you more than you expected it to explain.
Explaining large amounts of data about typical and atypical behaviors allows for the formation of mechanistic explanations for many mental disorders. The gift that keeps on giving.
Grossberg does not necessarily try to understand consciousness, but learning.
Chapter 17 is more speculative and discusses the origin of creativity, morality, religion, superstitution, self-defeatism, etc.
The book tries to clarify foundations of the main processes by which we know the world.
Science is never finished.

Do you think that there is a biological mechanism of backpropagation or deep learning?

Short answer: backpropagation and deep learning is a feed-forward adaptive filter: go from one place to another in adapted ways.
Artificially move adapted weights from the location where you learned them to where you need it to fill the inputs. Nonlocal weight transport.
Nonlocal transport has no analog in the brain. In the brain, all learning is local.

How does deep learning differ from Adaptive Resonance Theory?

How do brains self-organize? How does a cell develop into a learning machine?
Adaptive Resonance Theory (ART) - claimed by Grossberg to be the most advanced cognitive and neural theory about how our brain learns to attend, recognize, and predict objects and events in a changing world.
- Explanatory and predictive success.
Unlike deep learning, ART is not an adapted feedforward filter. It’s not a principle or an algorithm, it is a theory that intelligenc eis embodied by an explainable production system that carries out hypothesis testing, slef-stabilizing learning, and classification and prediction in a nonstationary world.
Deep learning dies in a nonstationary world.
Deep learning is untrustworthy because it’s not explainable, and unreliable because it experiences catastrophic forgetting.
Backpropagation has 17 problems that ART doesn’t (shown in late 1900s paper).
ART has short-term memory traces, and thus can focus attention on a set of critical features that control predictive success.
Anyone looking at an ART network knows the information that it’s using to make its predictions, so you can explain how it is predicted. You can’t do the same ith deep learning, Grossberg claims.
Catastrophic forgetting - deep learning as slow, supervised learning; an unpredictable part of the memory can crash and burn because it is a feedforward adaptive network.
Untrustworthy: not explainable, not reliable, experiences catastrophic forgetting.
ART has been confirmed in subsequent expeirments, and has provided unifying explanations for thousands of additional experiments.
ART was not created with no changes; it evolves incrementally.
Why should you believe ART? A profound reason : in 1980, derived ART from a thought experiment about how any learning system can autonomously correct predictive errors in an unstable world.
When you can derive a theory from a thought experiment, the hypothesis is prmeised on facts familiar to everyone because they represent ubiqitous environmental axioms. You don’t need to run into a pyschology/neuroscience book; the hypotheses are trivial. What makes it powerful is that a few constraints forces ART to happen.
Stability-Plasticity Dilemma: how can any system learn quickly without experiencing catastrophic forgetting? Stability is learning, plasticity is no catastrophic forgetting.
ART results about learning led Grossberg to explain how and why we have consciousness - why we use conscious states to plan goals.
Conscious states arise from resonances - when exitatory feedback sigfnals between two brain signals approximately match signal patterns well enough to cause active cells to synchronize. The excitatory feedback is sustianed long enough to trigger a conscious state, leading to learning. Resonance causes the adaptive state.
Top-down expectations solve the Stability-Plasticity Dilemma: not a feed-forward adaptive theory - but a resonating top-down system.
Proposed solution to the Mind-Body Problem arises from analysis of how we autonomously learn in a changing world without catastrophic wforgetting.
How do thinking and feeling interact?
- The Cognitive Emotion Model
- Thought experiments never mention the words mind or brain; they provide a blueprint for intelligent applications. Not about mind and brain, but instead about adapting in real-time to constraints.
What did Grossberg do, to essentialize it? Introduced the revolutionary paradigm of autonomous adaptive resonant intelligence. AI and technology will increasingly follow this scheme.
To make this work, top-down expectations are defined by a circuit which avoids catastrophic forgetting via specific design: obey the ART-matching rule.
- Realized by a specific anatomical circuit that exists in multiple species.
- Top-down modulatory onset off-surround network.
- Onset cells excited by top-down inputs, but only primed, sensitized, modulating. Primed to expect something.
- Off-surround is inhbitiing cells around the modulatory onset.
- Modulatory onset primes you; these primed cells selectively encode features.
- Critical features drive predictions and action; they are explainable because you can actively record them.
- ART-Matching rule enables top-down expectations to select critical feature patterns that control the success of prediction.
- You need short-term memory cell activations, which deep learning doesn’t have.
- Outlier features are inhibited; only critical features are learned. Outlier features cannot cause catastrophic forgetting.

How does learning get started?

If there are no learned top-down expectations, how do top-down expectations match a pattern before anything is learned?
Grossberg’s prediction, consistent with models - done by choosing the initial top-down adaptive weights.
Cells send signals down axons, gated by long-term memory tracer adaptive weights; adaptive wieghts are chosen to be uniformly distributed from eahch category cells to all of the target feature pattern cells.
Top-down cells are uniform and therefore can match any input before learning starts.
Learning prunes and cuts down, match critical feature patterns that are incrementally discovered via bottom-up learning.
Bottom-up weights gradually emerge on a stable set of critical features.
Top-down uniform weights allow you to match anything, to get started. If you mismatch on the first trial, you can never obtain resonance.
Fuzzy ART Mapping - no longer just categorical learning, becomes a predictive system.

Does ART have an explanation for what each cortical column in the neocortex does as a module?

The answer is yes.
Took decades to obtain.
LaminART model: provides a detailed eplanation of how cells in cortical columns operate.
All of the neocortex has lamina circuits, typically 6 main layers, with perceptual and cognitive circuits.
Paraidgm of lamina computing - begins to explain how all prior forms of biological intelligence were generated by variations of a single lamina circuit.
- In particular, lamina models of vision, speech perception, cognition - all use variations of the same canonical lamina cortical circuit.
- Existence proof shows how different tasks can all be explained from specializations of the same canonical circuit.
Lamina computing on a high level, what does it do?
- Properties:
  - Developmental learning process by which circuits are shaped to match environmental constraints and dynamically maintain them (solve the Stability-Plasticity problem).
  - Binding process with different appearances of grouping; all use variations of the same canonical circuit. Distributed binding problem.
  - Attentional process by which the cortex selectively processes important events. Attention.
- Solving the first property gives the second and third for free. One needs to solve the learning problem itself.
Complementary Computing.

What has ART not explained to Grossberg’s satifaction about the brain?

ART is just a small part of Grossberg’s work.
Broader perspective of work will be given later.
Have a better understanding of language in its complexity.
Imitation despite different perspective - joint attention, a problem in autism. Many autistic attentions cannot perform joint attention.
Language, music, social cognition.

Can we reach general AI by simulating the brain, or by simulating the evolution of the brain?

Models don’t mention model or brain.
Environments shape the modeling of our brain, universal models of how any system can autonomously correct predictive errors in a changing world.
The key point is autonomy. If you want AI to emulate human intelligence.

What is the next big milestone task for AI to accomplish?

AI doesn’t have to restrict itself to biological intelligence. if AI has a completely different set of goals and doesn’t care, then if you want to be intelligent how are you going to define intelligence?
Read thought experiments - if you can’t get out of them, you’re stuck with them.
Worked through multiple fads - they hit a brick wall and are forgotten, 20 or more over several decades. But Grossberg has never hit a brick wall.
Worry about the foundations so you don’t hit a brick wall if you want a successful research trajectory.
Use people like Grossberg as a launching pad without hitting a brick wall.
Scientific politics. There is almost instantaneous communication, but the signal to noise ratio is very low. There’s a lot of noise. The market is exceptionally loud.
Backprop and deep learning have sold their wares in a way that Grossberg considers intellectually dishonest. Work is untrustworthy and reliable, but advertise as this is how the brain works. “It’s just… unfortunate.” Ignorant or dishonest marketing, all the things that can attract your attention.
Grossberg’s work has been applied to many cases, but it’s been lost in the noise because there isn’t a buzz word - you have to study it.
Grossberg as Einstein of the mind - why don’t we know about Grossberg? We live in a world with a low signal-to-noise ratio, and the nature of the mind and nature of Einstein’s work was unappreciated much during his lifetime.
Cognitive impenetrability - people didn’t know that the mind was in the brain. People thought it was in the pancreas, the heart, etc. We don’t need to worry about the components - all we need to worry about are thoguhts, feelings, and so on - we can live in a macroscopic world.

See additional student questions in the recording!

AI History and Development

“I couldn’t care less about deep learning! Deep learning is a fad.” - Grossberg.

Backpropagation has problems, no internal representations. Backpropagation faded, and other models became more popular.
In the interim, due to the world wide web, there were large databases that you could query for.
The speed of networks beecame blindingly fast.
Large databases and fast computation allow for the training of backpropagation networks on millions of pictures.
Animal/object classification is a “joke!” to the broader field of learning.
Deep learning is a weak, damaged model that teaches Grossberg nothing.
Deep learning isn’t bad to learn some sort of predictions for applications, it can be useful, use it. But do not delude yourself about what deep leanring is. Know the strengths and weaknesses of the models you use.
Intellectual dishonesty of how some people describe the significance of deep learning. Deep learning is never compared to benchmarks, according to Grossberg.
Catastrophic forgetting can be avoided by sparsification of the network, which has been tried in neural networks. You don’t need sparsification if you have self-stabilizing models. We don’t have 100 layers in our full brain, not even for classification or recognition.
Deep learning is untrustworthy and unreliable in a specific scientific sense.

Advice for Students

If you want to dedicate your life towards something, you must be dying to know the answer and fall in love with the data.

Resources and Further Reading

Work	Description
Conscious Mind, Resonant Brain: How Each Brain Becomes a Mind, Oxford University Press, 2021.	“This book provides an introductory and self-contained description of… modern theories of mind and brain have recently proposed… Accessibly written, and lavishly illustrated, Conscious Mind/Resonant Brain is the magnum opus of one of the most influential scientists of the past 50 years, and will appeal to a broad readership across the sciences and humanities.”
“Towards Solving the Hard Problem of Consciousness: The Varieties of Brain Resonances and the Conscious Experiences that they Support”, TSC 2017.
“From Designs for Autonomous Adaptive Agents to Clinical Disorders: Linking Cortically-Mediated Learning to Alzheimer’s Disease, Autism, Amnesia, and Sleep”, WCCI Keynote, 2020.