strStr - Sunday Algorithm

Leetcode: strStr - SUNDAY algorithm


Sunday algorithm is a variation of BM algorithm, which is used broadly in string comparation, and is faster and easier than KMP algorithm(usually 3-5 times!).

Imagine we want to find “search” in string “substringsearchingalgorithm”. At beginning, we align substring with text:

substringsearchingalgorithm
search
^

Then we will find mismatch at second character, so we need to shift substring. But how much should we shift? That is the difference of string search algorithms. The naive algorithm is just shift 1 character; KMP uses the information of already matched part; BM algorithm uses reverse comparation. Here, I will introduce a new method, which is look at the followed character of current substring(character ‘i’ in example above).

Obviously, no mattch how much we shift, this character will participate in next comparation. In other words, if we matched in next step, this character must in our substring. So, we can shift substring to make it align with the rightmost same character. As for our example, there does not exist ‘i’ in substring “search”, which means we can jump a long distance, begin compare from the character next of ‘i’, as below:

substringsearchingalgorithm
search
     ^

In this round of comparation, the first charater is not match. Look at the next character behind substring, which is ‘c’, occurs in the last but one place of substring. So we shift substring for 2 characters to make these two ‘c’ aligned.

substringsearchingalgorithm
search
     ^

Hah! This time we find the match! Look back the whole proces, we only take 3 steps to find the matched string.

Java code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public int strStr(String haystack, String needle) {
if(needle==null || needle.length()==0) return 0;
if(haystack == null || needle.length() > haystack.length()) return -1;
char[] mainArr = haystack.toCharArray();
char[] targetArr = needle.toCharArray();
int i = 0;
while(i<mainArr.length){
int j = 0;
while(j < targetArr.length && i+j < mainArr.length && mainArr[i+j] == targetArr[j]){ //match
j++;
}
if(j == targetArr.length) return i;
else { //shift
if(i+targetArr.length < mainArr.length){
for(j = targetArr.length-1; j >= 0; j--){
if(targetArr[j] == mainArr[i+targetArr.length]){
break;
}
}
}
i += targetArr.length-j;
}
}
return -1;
}