Improve Long-term Memory Learning Through Rescaling the Error Temporally
Abstract
This paper studies the error metric selection for <PRE_TAG>long-term memory learning</POST_TAG> in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all temporally positive-weighted errors are biased towards short-term memory in learning linear functionals. To reduce this bias and improve <PRE_TAG>long-term memory learning</POST_TAG>, we propose the use of a temporally rescaled error. In addition to reducing the bias towards short-term memory, this approach can also alleviate the vanishing gradient issue. We conduct numerical experiments on different long-memory tasks and sequence models to validate our claims. Numerical results confirm the importance of appropriate temporally rescaled error for effective <PRE_TAG>long-term memory learning</POST_TAG>. To the best of our knowledge, this is the first work that quantitatively analyzes different errors' memory bias towards short-term memory in sequence modelling.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper