Discovering Temporal Patterns for Event Sequence Clustering via Policy Mixture Model
Temporal point process (TPP) is an expressive tool for modeling the temporal pattern of event sequences. However, discovering temporal patterns for event sequences clustering is rarely studied in TPP modeling. To solve this problem, we take a reinforcement learning view whereby the observed sequences are assumed to be generated from a mixture of latent policies. The purpose is to cluster the sequences with different temporal patterns into the underlying policies while learning each of the policy model. The flexibility of our model lies in: i) all the components are networks including the policy network for modeling the temporal point process; ii) to handle varying-length event sequences, we resort to inverse reinforcement learning by decomposing the observed sequence into states (RNN hidden embedding of history) and actions (time interval to next event) in order to learn the reward function, thus achieving better performance or increasing efficiency compared to existing methods using rewards over the entire sequence such as log-likelihood or Wasserstein distance. We adopt an Expectation-Maximization framework, in E-step estimating the cluster labels for each sequence, in M-step aiming to learn the respective policy. Extensive experiments on synthetic and real-world datasets show the efficacy of our method against the state-of-the-arts.
Branch: CSE Domain: Data Mining
Developed In: Java