标 题: 发一道G家的onsite题及教训
发信站: BBS 未名空间站 (Sat Jan 17 20:47:56 2015, 美东)
去年的onsite,挂在这题上了。其实不难,之前也有人发过,但好像没详细讨论过。
一组字符串,求所有彼此之间无公共字符的两两组合中,两字符串长度乘积的最大值。
上来就是暴力解O(n^2).问有没有更快的。我问:better than O(n^2)? 对方没正面回答
。结果我以为他是默认了。于是挖空心思的找O(nlogn)的解法,建字符索引,后缀树都
想过。最后没办法,问他有没有hint。结果他提了剪枝。当时我就崩溃了,剪枝早想到
了,但剪枝的话worst case还是O(n^2)啊!我立刻说了按长度排序再剪枝的方法。但是
时间已经不够写代码了。
想问问这题究竟有没有优于O(n^2)的解法。当然,假定比较两字符串的时间设为常数。
我的感觉是没有的。当然,我可能是错的。
教训就是面试是交流的过程,想到什么improvement就说出来讨论讨论,就算他不认可
,至少也知道你想到了一个方法。最忌讳闷头苦想,而面试官根本不知道你在想啥。
review: from here
1
Here's one take at it:
1 - Take each word and convert each into a bitmap by iterating over the characters and flipping a bit to 1 at an offset that corresponds to the position of the letter in the alphabet (e.g. abba = 11, a = 1, bdf = 101010, etc).
2 - perform a binary AND for each bitmap permutation. If ANDing the two bitmaps results in zero, and this match has the largest sum of word lengths so far, keep the word pair.
I believe this is O(n^2)-ish?
2 a more interesting method:
Use to denote the number of words and to denote the size of the alphabet. I will also use to denote maximum word length. Thus, the size of input is and this is also the time complexity of reading it.
The speedup for the naive quadratic-time solution is explained well in other answers. Here, I will explain a better algorithm for the case when the alphabet is small enough. Here, "small enough" will mean "we are able to spend time and memory". Note that for lowercase English letters we have which makes this algorithm perfectly reasonable.
But first, a curiosity. I used the algorithm described below on the 62887 lowercase words in /usr/share/dict/words on my machine. Here's the best output: individualizing phototypesetter.
Here's the algorithm:
As in the other answers, we will use bitmasks to represent sets of letters. Thus, the integers from 0 to will represent all possible subsets of our alphabet, 0 being the empty set.
Our algorithm will consist of three steps:
- For each set S of letters, find the longest word that consists of exactly those letters.
- For each set S of letters, find the longest word that consists of at most those letters (i.e., some letters may be unused, but you cannot use a letter that does not belong to S).
- For each word w, compute length(w) + length(longest word out of letters not in w) and pick the maximum.
Step 1 is easy: just take an array of size , then read the input, and for each word update the corresponding cell. This can be done in .
Step 2 can be done using dynamic programming. We process the subsets of our alphabet in the order 0, 1, ..., . (Note that this order that has the property that for any set S, all subsets of S are processed before S.) For each set of letters S, the best word for S is either the best word made out of exactly these letters (this we computed in phase 1), or at least one letter is unused. We try all possibilities for the unused letter and pick the best one. All of this takes time.
Step 3 is again easy. If we stored the set of letters for each word in step 1, step 3 can now be done in time. Hence the overall time complexity is .
No comments:
Post a Comment