CodeWa!
找到你要的答案

## Q：Repeated ordered sequence search algorithm |
## Q：重复有序序列搜索算法 |

I have large ordered sequence of symbols, millions of symbols. I have to find repeated ordered subsequences such that: - Search subsequences are unknown, I have to find subsequences that repeats elsewhere of large sequence.
- Subsequences may have differences such as presence some amount of noise and absence of some symbols.
Not necessary condition: - Subsequences may have little amount of permutations of neighbor symbols.
The alphabet consists of thousands symbols. Can you recommend well-known and well-studied algorithm for such task? |
I have large ordered sequence of symbols, millions of symbols. I have to find repeated ordered subsequences such that: - Search subsequences are unknown, I have to find subsequences that repeats elsewhere of large sequence.
- Subsequences may have differences such as presence some amount of noise and absence of some symbols.
非必要条件： - Subsequences may have little amount of permutations of neighbor symbols.
字母表由数千个符号组成。 你能为这样的任务推荐众所周知的和很好的算法？ |

answer1： | 回答1： |

You can try aho-corasick multiple pattern matching and use a wildcard to search for substrings. For subsequence you want also the levenstein-distance. You can try my implementation in PHP of aho-corasick algorithm with wildcard at https://phpahocorasick.codeplex.com. |
你可以试试Aho-Corasick多模式匹配和使用通配符查找子字符串。随后你也想要Levenstein距离。你可以在PHP与通配符在https://phpahocorasick.codeplex.com Aho-Corasick算法尝试我的实现。 |

algorithm sequence data-mining dynamic-programming bioinformatics |