Wednesday, October 1, 2014

Substring with Concatenation of All Words

 from leetcode discussion
Brute force solution takes O(n* l *w) time and O( l * w) auxiliary space. Here note 'n' is number of characters in the string ,'l' is the number of words in the list. and 'w' is the words length A better solution is done by taking advantage of the fact that the continuous words will be just the same for two valid sub-strings that are distant just by w characters
The brute force approach stops at each character and checks for all the words are in the list each time constructing the dictionary again and again O(n) times.
The words are all of the same width, Hence the dictionary will be the same for overlapping valid substrings (Ex: "foobarisbarfoobar" ["foo","bar"]) in which the "barfoobar" has overlapping valid substrings "barfoo" and "foobar" and both have same dictionary.Thus we need not construct the dictionary again, instead reuse the dictionary with some proper cases.
Secondly instead of traversing the string character by character we can traverse it in the form of words by words( w increments) fot the above purpose. then do the same for the a character shifted to find out the valid strings there.
Ex: the same above one will be traversed as "foo" "bar" "isb" "arf" "oob" "ar" on the first iteration and then "oob" "ari" "sba" "rfo" "oba" "r" and next "oba" "ris" "bar" "foo" "bar" next.
This will reduce the need for construction of Dictionaries O(n) times instead we do it O(w) times. and we will be making a total of  O(w) iterations with O(n) string comparison operations each time, hence a time complexity of O(nw) and an auxiliary space of O(lw)

The algorithm works like this:
    a) initially store the number of occurences of each word in Hash table "cn".
    b) For i = 0 to w
          start = i;
          For w : all words at intervals i, i+w, i+2w, .. n
           i) check if word w is  valid, if not set start to next word as the substring uptil now will be invalid.

          ii) if word is valid, check the number of occurences of this word in the substring  is less than the number that is needed . if yes, increment  count of the word in the hash table cntL

        iii) if the word is valid , but already occured maximum number of times, then consider the substring starting after the first occurence of this word. set start.

        if all the dictionary words are used then store the start as the valid substring start.

code:
class Solution {
private:
vector<int> res;
map<string,int> cntL;
map<string,int> cn;
int n ;
public:
vector<int> findSubstring(string S, vector<string> &L)
{   res.clear();
    cntL.clear();
    cn.clear();

    n = S.length();
    int e = L.size();
    int t = L[0].length();

    for(int i = 0; i < e ; i++){  
         cn[L[i]]++;
    }

    string tr ,du;
    int cnt = 0;
    int st = 0;

    for(int j = 0 ; j < t ; j++)
    { cnt = 0; st = j;
      for(int i = j; i < n; i += t)
        {     tr = S.substr(i,t);
              if( cn.count(tr) == 0 || cn[tr] == 0 )
              { cntL.clear();
                cnt =  0;
                st = i+t;
              }
              else if(cntL[tr] < cn[tr])
              { cntL[tr] += 1;
                cnt++;
              }
              else
              {  du = S.substr(st,t);
                 while(du != tr)
                 {   cntL[du]--;
                     cnt--;
                     st += t;
                     du = S.substr(st,t);
                 }
                 st += t;
              }
             if(cnt == e)
              {   res.push_back(st);
                  du = S.substr(st,t);
                  cntL[du]--;
                  cnt--;
                  st += t;
              }

         }
         cntL.clear();
      }
    sort(res.begin(),res.end());
    return res ;   
 }
};
//brute force, from anniekim

    vector<int> findSubstring(string S, vector<string> &L) {
             vector<int> res;
        if (S.empty() || L.empty()) return res;
        int M = S.size(), N = L.size();
        int K = L[0].size();
        unordered_map<string, int> need;
        unordered_map<string, int> found;
        for (int i = 0; i < N; ++i)
            need[L[i]]++;
        for (int i = 0; i <= M - N * K; ++i)
        {
            found.clear();
            int j;
            for (j = 0; j < N; ++j)
            {
                string s = S.substr(i + j * K, K);
                if (need.find(s) == need.end())
                    break;
                if (need[s]<= found[s])
                    break;
                found[s]++;
            }
            if (j == N) res.push_back(i);
        }
        return res;
    }  
   

No comments:

Post a Comment