SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. Time series analysis, as an application for high dimensional data mining, is a common task in biochemistry, meteoro- logy, climate research, bio-medicine or marketing. Similarity search in data with increasing dimensionality results in an exponential growth of the search space, referred to as Curse of Dimensionality. A common approach to postpone this ef- fect is to apply approximation to reduce the dimensionality of the original data prior to indexing. However, approxim- ation involves loss of information, which also leads to an exponential growth of the search space. Therefore, index- ing an approximation with a high dimensionality, i.e. high quality, is desirable. We introduce Symbolic Fourier Approximation (SFA) and the SFA trie which allows for indexing of not only large data- sets but also high dimensional approximations. This is done by exploiting the trade-off between the quality of the ap- proximation and the degeneration of the index by using a variable number of dimensions to represent each approxim- ation. Our experiments show that SFA combined with the SFA trie can scale up to a factor of 5–10 more indexed di- mensions than previous approaches. Thus, it provides lower page accesses and CPU costs by a factor of 2–25 respect- ively 2–11 for exact similarity search using real world and synthetic data.
Keywords for this software
References in zbMATH (referenced in 5 articles )
Showing results 1 to 5 of 5.
- Abdallah, Zahraa S.; Gaber, Mohamed Medhat: Co-eye: a multi-resolution ensemble classifier for symbolically approximated time series (2020)
- Shifaz, Ahmed; Pelletier, Charlotte; Petitjean, François; Webb, Geoffrey I.: TS-CHIEF: a scalable and accurate forest algorithm for time series classification (2020)
- Le Nguyen, Thach; Gsponer, Severin; Ilie, Iulia; O’Reilly, Martin; Ifrim, Georgiana: Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations (2019)
- Schäfer, Patrick: Scalable time series classification (2016)
- Schäfer, Patrick: The BOSS is concerned with time series classification in the presence of noise (2015)