arxiv:2307.11462

Improve Long-term Memory Learning Through Rescaling the Error Temporally

Published on Jul 21, 2023

Authors:

Abstract

This paper studies the error metric selection for <PRE_TAG>long-term memory learning</POST_TAG> in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all temporally positive-weighted errors are biased towards short-term memory in learning linear functionals. To reduce this bias and improve <PRE_TAG>long-term memory learning</POST_TAG>, we propose the use of a temporally rescaled error. In addition to reducing the bias towards short-term memory, this approach can also alleviate the vanishing gradient issue. We conduct numerical experiments on different long-memory tasks and sequence models to validate our claims. Numerical results confirm the importance of appropriate temporally rescaled error for effective <PRE_TAG>long-term memory learning</POST_TAG>. To the best of our knowledge, this is the first work that quantitatively analyzes different errors' memory bias towards short-term memory in sequence modelling.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2307.11462 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2307.11462 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2307.11462 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.