MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification. In this work, we propose MISSEL (Multiple Sub Sequences Extractor for cLassification), a novel algorithm that relies on a feature selection technique to extract multiple and locally adjacent solutions for supervised machine learning problems. In particular, we design our method to be applied in the biological framework where the relative position of a feature is relevant (e.g., sequenced data). Our goal is to find sets of separating features that are as close as possible to each other. Another crucial issue is to highlight the subsequences that appear with the same required characteristics. Our approach adopts a fast and effective method to evaluate the quality of subsequences and integrates it in a genetic algorithm.

Keywords for this software

Anything in here will be replaced on browsers that support the canvas element