Maximization of Future Internal States?
Log in to download the full text for free
> 1 Comment
Open peer commentary on the article “Foresight Rather than Hindsight? Future State Maximization As a Computational Interpretation of Heinz von Foerster’s Ethical Imperative” by Hannes Hornischer, Simon Plakolb, Georg Jäger & Manfred Füllsack. Abstract: The target article outlines a Future-State-Maximization (FSX) approach whose focus on “rewarding” actions that lead to increased action possibilities serves as an alternative to standard value-based learning approaches. In my commentary, I discuss how internal states might shape future action possibilities. Specifically, the notion of allostasis is discussed in relation to how physiological (internal variable) regulation may enable or constrain future action spaces.
Handling Editor: Alexander Riegler
Lowe R. (2020) Maximization of future internal states? Constructivist Foundations 16(1): 060–062. https://constructivist.info/16/1/060
Export article citation data:
Plain Text ·
Reference Manager (RIS)
Cannon W. B. (1915)
Bodily changes in pain, hunger, fear and rage. Appleton, New York. ▸︎ Google︎ Scholar
Cerezo S. H., Ballester G. D. & Baxevanakis S. (2018)
Solving Atari Games using fractals and entropy. ArXiv:1807.01081 (Cs) July. http://arxiv.org/abs/1807.01081.
Corcoran A. W. & Hohwy J. (2019)
Allostasis, interoception, and the free energy principle: Feeling our way forward. In: Tsarkiris M. & De Preester H. (eds.) The interoceptive mind: From homeostasis to awareness. Oxford University Press, Oxford: 272–292. ▸︎ Google︎ Scholar
Gu X. & FitzGerald T. H. (2014)
Interoceptive inference: Homeostasis and decision-making. Trends in Cognitive Sciences 18(6): 269–70. ▸︎ Google︎ Scholar
Kiryazov K., Lowe R., Becker-Asano C. & Randazzo M. (2013)
The role of arousal in two-resource problem tasks for humanoid service robots. In: Proceedings of IEEE RO-MAN 201. IEEE, Piscataway NJ: 62–69. ▸︎ Google︎ Scholar
Lowe R. & Ziemke T. (2013)
Exploring the relationship of reward and punishment in reinforcement learning. In: Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, Piscataway NJ: 140–147. ▸︎ Google︎ Scholar
Lowe R., Montebelli A., Ieropoulos I., Greenman J., Melhuish C. & Ziemke T. (2010)
Grounding motivation in energy autonomy: A study of artificial metabolism constrained robot dynamics. In: Fellermann H., Dörr M., Hanczyc M., Laursen L., Maurer S., Merkle D., Monnard P.-A., Sty K. & & Rasmussen S. (eds.) Artificial life XII. MIT Press, Cambridge MA: 725–732. https://cepa.info/2408
McEwen B. S. (1998)
Stress, adaptation, and disease: Allostasis and allostatic load. Annals of the New York Academy of Sciences 840: 33–44. ▸︎ Google︎ Scholar
Montebelli A., Lowe R. & Ziemke T. (2008)
The cognitive body: From dynamic modulation to anticipation. In: Pezzulo G., Butz M. V., Sigaud O. & Baldassarre G. (eds.) Anticipatory behavior in adaptive learning systems: From psychological theories to artificial cognitive systems. Springer, Berlin: 132–151. ▸︎ Google︎ Scholar
Montebelli A., Lowe R., Ieropoulos I., Melhuish C. R., Greenman J. & Ziemke T. (2010)
Microbial fuel cell driven behavioral dynamics in robot simulations. In: Fellermann H., Dörr M., Hanczyc M., Laursen L., Maurer S., Merkle D., Monnard P.-A., Sty K. & Rasmussen S. (eds.) Artificial life XII. MIT Press: Cambridge MA: 749–756. ▸︎ Google︎ Scholar
Sterling P. (2004)
Principles of allostasis: Optimal design, predictive regulation, pathophysiology, and rational therapeutics. In: Schulkin J. (ed.) Allostasis, homeostasis, and the costs of physiological adaptation. Cambridge University Press, Cambridge: 17–63. ▸︎ Google︎ Scholar
To stay informed about comments to this publication and post comments yourself, please log in first.
|Comment by Hernandez Cerezo Sergio · 15 Apr 2021|
|In our paper http://arxiv.org/abs/1803.05049 we discuss the need of using a reward asssociated with each state to act as a proxy for survival probabilities.|
We played with using the energy and health levels (in the range 0–1) of the agent so the reward of an state is energy_level * health_level. This produces a behaviour that not only maximize future state diversity but also their rewards, so the agent chooses actions that increase future freedom of action (exploration) and overall probabilities of surviving afterwards (exploitation).
In adition to those “intrinsic rewards” of the agent, one can add new custom external rewards -like picking a rock with a hook giving you an extra reward- to push the agent to perform any desired custom task, while keeping its surviving probabilities and freedom of action always high.
The pseudo-code referred in the article corrrespond to a simplified version of the algorithm where rewards are not fully used. In our paper there is a “complete” version of the algorithm (see “4.3 Pseudo-code”).