Hello World
Spiga

如何生成一段Typoglycemia文本?

2012-11-05 23:39 by 老赵, 3647 visits

Typoglycemia是个新词,描述的是人们识别一段文本时的一个有趣的现象:只要每个单词的首尾字母正确,中间的字母顺序完全打乱也没有关系,照样可以正常理解。例如这么一段文字:

I cdnuol't blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg: the phaonmneel pweor of the hmuan mnid. Aoccdrnig to a rseearch taem at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Scuh a cdonition is arppoiatrely cllaed Typoglycemia.

Amzanig huh? Yaeh and you awlyas thguoht slpeling was ipmorantt.

我们其实可以较为轻松地识别出其原文:

I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.

Amazing, huh? Yeah and you always thought spelling was important.

事实上中文也有类似的性质,文字序乱是不响影正阅常读的。

那么我们可以如何从一段正确的文本(下方)生成一段Typoglycemia文本(上方)呢?这其实是我今天出的一道面试题,简单地说就是要求实现这么一个方法:

string MakeTypoglycemia(string text); 

规则很简单:

  1. 保持所有非字母的字符位置不变。
  2. 保持单词首尾字母不变,中间字符打乱。

所谓”打乱“,可以随意从网上找一段数组乱序的算法即可,无需保证一定改变(例如某些除去头尾只有两个字母的单词,偶尔保留不变问题也不大)或者每个字符都不在原来的位置上。不过,我们假设这段代码会被大量调用,因此希望可以尽可能地效率高些,内存使用少些。

要不您也来试下?这题我建议使用C#或Java来实现,因为这两个环境里的内存分配行为比较容易判断。当然您用Ruby,Python或JavaScript等语言问题也不大,效率容易判断,但内存分配方便可能就比较难计较了。

其实这题还有后续,那就是把目标反一反:从一个前后确定中间乱序的单词,找到其原始的,正确的单词。自然,我们会得到一个长长的单词列表,十万个单词吧,并保证没有两个单词仅仅是中间几个字符的顺序不同。那么,您会如何设计一个数据结构,让我们可以快速的从一个乱序后的单词找到其正确形式呢?

不过还是那句话:一定要写下代码来,否则思路不谈也罢。

Creative Commons License

本文基于署名 2.5 中国大陆许可协议发布,欢迎转载,演绎或用于商业目的,但是必须保留本文的署名赵劼(包含链接),具体操作方式可参考此处。如您有任何疑问或者授权方面的协商,请给我留言

Add your comment

33 条回复

  1. Hooopo
    50.116.12.*
    链接

    Hooopo 2012-11-06 01:05:11

    doc = <<-EOF
    I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.
    EOF
    
    doc.gsub(/([\w']+)/){|world| world.sub(/(\w)(\w*)(\w)/){$1 + $2.reverse + $3}}
    

    Ruby正则版~

  2. 逆铭
    207.151.233.*
    链接

    逆铭 2012-11-06 04:51:41

    Scala

    import util.Random.shuffle
    
    val doc: String =
      """I couldn't believe that I could actually understand what I was reading:
        |the phenomenal power of the human mind. According to a research team at
        |Cambridge University, it doesn't matter in what order the letters in a
        |word are, the only important thing is that the first and last letter be
        |in the right place. The rest can be a total mess and you can still read
        |it without a problem. This is because the human mind does not read every
        |letter by itself, but the word as a whole. Such a condition is appropriately
        |called Typoglycemia.
        |
        |Amazing, huh? Yeah and you always thought spelling was important.
        |""".stripMargin
    
    // Produce typos
    
    // The document can be huge, possibly a stream of chars
    val docStream = doc.toStream
    
    def produceTypoglycemia(charStream: Stream[Char], accu: Vector[Char] = Vector.empty): Stream[String] = {
      def makeWordType(wordAcc: Vector[Char]) = {
        if (wordAcc.length <= 1) wordAcc.mkString
        else
          wordAcc.head + shuffle(wordAcc.slice(1, wordAcc.length - 1)).mkString + wordAcc.last
      }
      charStream match {
        case head #:: tail if head.isLetter =>
          // Vector supports effectively constant time prepend and append operation
          produceTypoglycemia(tail, accu :+ head)
        case head #:: tail =>
          makeWordType(accu) #:: head.toString #:: produceTypoglycemia(tail, Vector.empty)
        case _ =>
          Stream.empty
      }
    }
    
    val typoglycemia = produceTypoglycemia(docStream)
    println(typoglycemia.mkString)
    
    // Crude recovery
    
    // Can't handle ambiguity, and not handling case
    // To disambiguate, use a language model (e.g. bi-gram)
    val dict = doc.split("[^A-Za-z]+").groupBy(_.sorted).mapValues(_.head).withDefault(identity)
    println(typoglycemia.map(s => dict(s.sorted)).mkString)
    
  3. Also
    175.41.31.*
    链接

    Also 2012-11-06 11:03:06

    菜鸟超级冗长java版....

    import java.util.Random;
    
    public class Main {
    
        /**
         * @param args
         */
        public static void main(String[] args) {
            String text = "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.Amazing, huh? Yeah and you always thought spelling was important.";
            System.out.print(getText(text));
        }
    
        public static String getText(String text) {
            String[] resultText = text.split(" ");
            String fString = "";
            for (int i = 0; i < resultText.length; i++) {
                fString += String.valueOf(getRandomArray(resultText[i])) + " ";
            }
            return fString;
        }
    
        public static char[] getRandomArray(String text) {
            char[] sequence = text.toCharArray();
            int no = 0;
            if(sequence[sequence.length-1]>= 61 && sequence[sequence.length-1] <= 122)
                no = sequence.length;
            else
                no = sequence.length-1;
            if (no > 3) {
                Random random = new Random();
                for (int i = 1; i < no - 1; i++) {
                    int p = random.nextInt(no - 1);
                    if (p != 0 && p != no) {
                        char tmp = sequence[i];
                        if (tmp >= 61 && tmp <= 122 && sequence[p] >= 61
                                && sequence[p] <= 122) {
                            sequence[i] = sequence[p];
                            sequence[p] = tmp;
                        }
                    }
                }
                random = null;
            }
            return sequence;
        }
    }
    
  4. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 12:01:30

    LinqPad 手写版(C#)

    void Main()
    {
        var source = "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.\nAmazing, huh? Yeah and you always thought spelling was important.";
    
        var result = MakeTypoglycemia(source);
    
        Console.WriteLine("{0}", result, result.Length);
    }
    
    // Define other methods and classes here
    const string Seperators = " .,'?!-\r\n";
    
    bool IsSeperator(char c)
    {
        return Seperators.Contains(c);
    }
    
    string MakeTypoglycemia(string text)
    {
        //we put a wrod's fsrit cahr itno the beffur,
        //tehn fnid the end of it,
        //then suflfhe the chars ibeewnetn and bufefr them,
        //then bfuefr the last char,
        //and pccreod to nxet word
        var buffer = new StringBuilder();
        //pinots to the fisrt char of a word
        var wordStart = 0;
    
        var index = 0;
        for(; index < text.Length; index++)
        {
            var c = text[index];
            if(wordStart == index)
            {
                buffer.Append(c);
            }
            if(IsSeperator(c))
            {
                //now idenx ponits to the non-wrod-cahr flooilwng a word
                CompleteWord(buffer, wordStart, index, text);
                wordStart = index + 1;
            }
        }
    
        if(wordStart < text.Length - 1)
            CompleteWord(buffer, wordStart, text.Length-1, text);
    
        return buffer.ToString();
    }
    
    private void CompleteWord(StringBuilder buffer, int wordStart, int end, string text)
    {
        var c = text[end];
        if(end - wordStart == 0)
        {
            //do nhiontg sncie the frsit and olny cahr aealdry put to bffuer
        }
        else if(end - wordStart == 1)
        {
            buffer.Append(c);
        }
        else if(end - wordStart == 2)
        {
            buffer.Append(text[wordStart+1]);
            buffer.Append(c);
        }
        else if(end - wordStart == 3)
        {
            buffer.Append(text[wordStart+1]);
            buffer.Append(text[wordStart+2]);
            buffer.Append(c);
        }
        else
        {
            var indexMapping = GetMapping(end - wordStart - 2);
            foreach(var m in indexMapping)
            {
                buffer.Append(text[wordStart +1 + m]);
            }
            buffer.Append(text[end-1]);
            buffer.Append(c);
        }
    }
    
    IEnumerable<int> GetMapping(int length)
    {
        Debug.Assert(length >= 2);
        //if it's ok, we can cache the index sequences for specific target length, then it would be much faster.
        var buffer = Enumerable.Range(0, length).ToList();
        Random rnd = new Random();
    
        for (int i = 0; i < buffer.Count; i++)
        {
            int j = rnd.Next(i, buffer.Count);
            yield return buffer[j];
    
            buffer[j] = buffer[i];
        }   
    }
    
  5. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 12:11:04

    至于反方向的算法,我来嘴炮一把,楼主不满可删除. 找26个素数,分别对应26个字母(假设只有26个字母,不区分大小写).对词典每个单词算其各字母对应数字的乘积,以这个积为hash进行索引. 碰到打乱的单词,直接算乘积就可以找到对应的原始单词.

  6. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 12:25:28

    哦,楼上忘了也有可能开头结尾不同的单词但是是由同样字母组合的.也简单,hash方法改一下: prime(首字母) ^ 3 * prime(尾字母) ^ 2 * (其它字母prime之积).

  7. 链接

    麒麟.NET 2012-11-06 12:30:09

    @deerchao

    不同的单词有可能乘积相同,比如af和bc。

    我的想法是,将字典中的单词以字母顺序排序,作为key(反向算法可以不考虑首尾字母和特殊符号位置不变的约束)

  8. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 12:47:31

    晕,还是错的. 因为一个单词里边的字母是可在不同位置重复出现的. 算了,没时间思考了,这几条留言帮我删除了吧.

  9. 原子发射器遥控器
    142.54.177.*
    链接

    原子发射器遥控器 2012-11-06 13:28:15

    第一个

    doc = <<-EOF
    I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.
    EOF
    
    rd = doc.split.map do |word|
      word[1...-1] = word[1...-1].each_char.to_a.shuffle!.join
      word
    end.join(' ')
    
    puts rd
    

    第二个

    class
      attr_accessible :ht,#head & tails char
      :length,
       :same_chars,#有几个不同的字符
       :diff_chars,#有几个相同的字符
       :diff_char_max#最多的相同字符
    end
    

    大概这个思路,具体细化程度视需求而定

  10. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 13:45:02

    翠花,上代码!(C#)

    从打乱的单词中恢复原来的单词:

    void Main()
    {
        BuildPrimes();
        //用例句作为单词库
        var sentense = @"I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.
    
    Amazing, huh? Yeah and you always thought spelling was important.".ToLower();
    
        BuildDictioary(SplitWords(sentense));
    
    
        foreach(var topyimygcela in SplitWords(MakeTypoglycemia(sentense)))
        {
            var result = FindOriginalWord(topyimygcela);
            Console.WriteLine("{0} => {1}", topyimygcela, result);
        }
    }
    
    // Define other methods and classes here
    private Dictionary<System.Numerics.BigInteger, string> _dictionary;
    private Dictionary<char, int> _primesHead;
    private Dictionary<char, int> _primesBody;
    private Dictionary<char, int> _primesTail;
    
    private void BuildPrimes()
    {
        var letters = Enumerable.Range('a', 26).Select(x => (char)x).ToList();
        var index = 0;
        _primesBody = GetAllPrimes().Take(letters.Count).ToDictionary(x => letters[index++]);
        index = 0;
        _primesHead = GetAllPrimes().Skip(letters.Count).Take(letters.Count).ToDictionary(x => letters[index++]);;
        index = 0;
        _primesTail = GetAllPrimes().Skip(letters.Count*2).Take(letters.Count).ToDictionary(x => letters[index++]);;
    }
    
    private IEnumerable<int> GetAllPrimes()
    {
        return Enumerable.Range(2, int.MaxValue-2)
            .Where(x => {
                for(var i = 2; i <= Math.Sqrt(x); i++)
                {
                    if(x % i == 0)
                        return false;
                }
                return true;
            });
    }
    
    private IEnumerable<string> SplitWords(string sentence)
    {
        var words = Regex.Matches(sentence, @"\w+")
                        .Cast<Match>()
                        .Select(x => x.Value);
    
        return words;
    }
    
    private string FindOriginalWord(string topyimygcela)
    {
        if(topyimygcela.Length <= 3)
            return topyimygcela;
    
        var hash = ComputeHash(topyimygcela);
        string result;
        _dictionary.TryGetValue(hash, out result);
        return result;
    }
    
    private System.Numerics.BigInteger ComputeHash(string word)
    {
        if(word.Length <= 3)
            return -1;
    
        var head = word[0];
        var tail = word[word.Length-1];
    
        System.Numerics.BigInteger hash = _primesHead[head] * _primesTail[tail];
        for(var i=1; i<word.Length-1; i++)
        {
            hash *= _primesBody[word[i]];
        }
    
        return hash;
    }
    
    private void BuildDictioary(IEnumerable<string> words)
    {
        _dictionary = words.Distinct()
            .Where(x => x.Length > 3)
            .ToDictionary(ComputeHash);
    }
    
    //==================以下是用来打乱单词的====================================
    const string Seperators = " .,'?!-\r\n";
    
    bool IsSeperator(char c)
    {
        return Seperators.Contains(c);
    }
    
    string MakeTypoglycemia(string text)
    {
        //we put a wrod's fsrit cahr itno the beffur,
        //tehn fnid the end of it,
        //then suflfhe the chars ibeewnetn and bufefr them,
        //then bfuefr the last char,
        //and pccreod to nxet word
        var buffer = new StringBuilder();
        //pinots to the fisrt char of a word
        var wordStart = 0;
    
        var index = 0;
        for(; index < text.Length; index++)
        {
            var c = text[index];
            if(wordStart == index)
            {
                buffer.Append(c);
            }
            if(IsSeperator(c))
            {
                //now idenx ponits to the non-wrod-cahr flooilwng a word
                CompleteWord(buffer, wordStart, index, text);
                wordStart = index + 1;
            }
        }
    
        if(wordStart < text.Length - 1)
            CompleteWord(buffer, wordStart, text.Length-1, text);
    
        return buffer.ToString();
    }
    
    private void CompleteWord(StringBuilder buffer, int wordStart, int end, string text)
    {
        var c = text[end];
        if(end - wordStart == 0)
        {
            //do nhiontg sncie the frsit and olny cahr aealdry put to bffuer
        }
        else if(end - wordStart == 1)
        {
            buffer.Append(c);
        }
        else if(end - wordStart == 2)
        {
            buffer.Append(text[wordStart+1]);
            buffer.Append(c);
        }
        else if(end - wordStart == 3)
        {
            buffer.Append(text[wordStart+1]);
            buffer.Append(text[wordStart+2]);
            buffer.Append(c);
        }
        else
        {
            var indexMapping = GetMapping(end - wordStart - 2);
            foreach(var m in indexMapping)
            {
                buffer.Append(text[wordStart +1 + m]);
            }
            buffer.Append(text[end-1]);
            buffer.Append(c);
        }
    }
    
    IEnumerable<int> GetMapping(int length)
    {
        Debug.Assert(length >= 2);
        //if it's ok, we can cache the index sequences for specific target length, then it would be much faster.
        var buffer = Enumerable.Range(0, length).ToList();
        Random rnd = new Random();
    
        for (int i = 0; i < buffer.Count; i++)
        {
            int j = rnd.Next(i, buffer.Count);
            yield return buffer[j];
    
            buffer[j] = buffer[i];
        }   
    }
    
  11. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 13:52:20

    @麒麟.NET

    不会的. af和bc肯定不会相同.因为它们分解质因数后,因子不同.

  12. deerchao
    108.171.245.*
    链接

    deerchao 2012-11-06 14:02:49

    冒似我的实现中有一点和需求不一致,不过稍微改一下就可以了. couldn't 在需求中是一个单词; 而在我的实现中是当成两个单词加一个分隔符处理的. 另外,原需求说要"保持所有非字母的字符位置不变", 我的实现里是所有预定的分隔符位置不变, 这个要改更容易了,直接改 IsSeperator() 就可以了.

    bool IsSeperator(char c)
    {
         return 'a' <= c && c <= 'z' || 'A' <= c && c <= 'Z';
    }
    
  13. 链接

    2012-11-06 14:47:56

    不会写Lambda表达式……给老赵跪了,这个算是c# 2.0版吧T_T:

    话说这个评论编辑器还真奇怪……

    public string MakeTypoglycemia(string text)
    {
        char[] textarray = text.ToCharArray(); //原始文字
        StringBuilder output = new StringBuilder(textarray.Length); //输出文字
    
        for (int i = 0; i < textarray.Length; i++)
        {
            int wordlength = 1;
            if (char.IsLetter(textarray[i]))
            {
                while (char.IsLetter(textarray[i + wordlength])) //计算单词长度
                {
                    wordlength++;
                }
                if (wordlength > 3)
                {
                    output.Append(textarray[i]);
                    output.Append(RandomizeWord(text.Substring(i + 1, wordlength - 2)));
                    output.Append(textarray[i + wordlength - 1]);
                }
                else
                {
                    output.Append(text.Substring(i, wordlength));
                }
                i = i + wordlength - 1;
            }
            else //不是字母
            {
                output.Append(textarray[i]);
            }
        }
        return output.ToString();
    }
    
    //数组乱序
    char[] RandomizeWord(string word)
    {
        byte[] keys = new byte[word.Length];
        (new Random()).NextBytes(keys);
        char[] wordArray = word.ToCharArray();
        Array.Sort(keys, wordArray);
        return wordArray;
    }
    
  14. edc杨洋
    218.69.33.*
    链接

    edc杨洋 2012-11-06 16:34:59

    Java:

    String justDoIt(String content) {
    
        String[] strs = content.replaceAll("[^a-zA-Z':?!.,]", "\\|").split("\\|");
        for (int i = 0; i < strs.length; i++) {
            char[] cs = strs[i].trim().toCharArray();
            if ("".equals(String.valueOf(cs))) continue;
            if (cs.length <= 2) {
                strs[i] = String.valueOf(cs);
                continue;
            }
    
            char beg = cs[0];
            char end = cs[cs.length - 1];
            String symbol = "";
            if (String.valueOf(cs).matches("\\b\\w*\\b[,.?!:]")) {
                end = cs[cs.length - 2];
                symbol = String.valueOf(cs[cs.length - 1]);
            }
    
            ArrayList<String> al = new ArrayList<String>();
            for (char c : strs[i].replaceAll("\\b\\w|\\w\\b[,:?!.]", "").toCharArray())
                al.add(String.valueOf(c));
    
            Collections.shuffle(al);
            Iterator it = al.iterator();
            StringBuffer sb = new StringBuffer();
            while (it.hasNext())
                sb.append(it.next());
    
            strs[i] = beg + sb.toString() + end + symbol;
        }
    
        StringBuffer buf = new StringBuffer();
        for (String str : strs)
            buf.append(str + " ");
    
        return buf.toString();
    }
    
  15. 链接

    一刀一个 2012-11-07 16:05:58

    C#

    var testWords = @"Typoglycemia是个新词,描述的是人们识别一段文本时的一个有趣的现象:只要每个单词的首尾字母正确,中间的字母顺序完全打乱也没有关系,照样可以正常理解。例如这么一段文字:I cdnuol't blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg: the phaonmneel pweor of the hmuan mnid. Aoccdrnig to a rseearch taem at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Scuh a cdonition is arppoiatrely cllaed Typoglycemia.";
    
    Func<List<char>, string> Rwords = (x) =>
    {
        var length = x.Count;
        var chars = x;
        for (int i = 1; i < length - 1; i++)
        {
            Random r = new Random();
            var n = r.Next(1, length - 1);
            var t = chars[i];
            chars[i] = chars[n];
            chars[n] = t;
        } 
    
        return string.Join("", chars);
    };
    var s = new System.Text.StringBuilder();
    var testWordsArray = testWords.ToCharArray();
    List<char> tempWords = null;
    foreach (var c in testWordsArray)
    {
        if (('a' <= c && c <= 'z')
            ||
            ('A' <= c && c <= 'Z')
        )
        {
            if (tempWords == null)
            {
                tempWords = new List<char>();
            }
            tempWords.Add(c);
        }
        else
        {
            if (tempWords != null)
            {
                var rwords = Rwords(tempWords);
                s.Append(rwords);
                tempWords = null;
            }
            s.Append(c);
        }
    }
    
  16. zyf
    114.212.86.*
    链接

    zyf 2012-11-07 19:27:08

    用Java写的,用了一些自己写的仿LINQ函数,不过感觉完全没效率

    public static String MakeTypoglycemia(final String text)
    {
        final CharMatcher matcher = CharMatcher.JAVA_LOWER_CASE.or(CharMatcher.JAVA_UPPER_CASE);
    
        //所有需打乱字符的原始下标
        AbstractUnmodifiableIterable<Integer> sufferIndexs = Iterables.Range(1, text.length() - 2).Where(
                new Predicate<Integer>() {
    
                    @Override
                    public boolean DoPredicate(Integer in)
                    {
                        return matcher.apply(text.charAt(in.intValue()));
                    }
                });
    
        //所有需打乱字符
        List<Character> buffer = sufferIndexs.Map(new Func1<Integer, Character>() {
    
            @Override
            public Character DoFunc(Integer in1)
            {
                return Character.valueOf(text.charAt(in1.intValue()));
            }
        }).ToList();
    
        //打乱字符
        Collections.shuffle(buffer);
    
        final StringBuilder builder = new StringBuilder(text);
    
        sufferIndexs.Zip(buffer).Each(new Action1<Tuple2<Integer, Character>>() {
    
            @Override
            public void DoAction(Tuple2<Integer, Character> input)
            {
                builder.setCharAt(input.GetItem1().intValue(), input.GetItem2().charValue());
            }
        });
    
        return builder.toString();
    }
    
  17. 老赵
    admin
    链接

    老赵 2012-11-07 19:45:08

    好像很多同学都不关注效率,那做这题基本意义失去了一大半了,本来就是这么简单的问题,只追求结果正确,太普通了。

  18. 链接

    Manyan Ouyang 2012-11-07 21:03:38

    贴完之后发现些小问题。重贴下(C#)。

        public static string MakeRandom(string context)
    {
    
        if (string.IsNullOrEmpty(context) || context.Trim().Length < 4)
        {
            return context;
        }
    
        var sb = new StringBuilder();
        var afterEnd=0;//the positon after a word's end.
        var start = 0;//the start positon of a word.
        var middle = 0;
    
        sb.Append(context);
    
        Random random = new Random();
        while(afterEnd<sb.Length)
        {
            if (char.IsLetter(context[afterEnd]))
            {
                start = afterEnd;
                do
                {
                    afterEnd++;
                } while (afterEnd < sb.Length&&( char.IsLetter(context[afterEnd]) ) );
    
                if(afterEnd>=start+4)   //only deal with word having four or more letters.
                {
                    middle = start + (afterEnd-1 - start) / 2;
    
                    for (var i = start + 1; i <= middle; i++)
                    {
                        var randomNumber = random.Next(middle+1, afterEnd - 1);
                        var tempChar = sb[randomNumber];
                        sb[randomNumber] = sb[i];
                        sb[i] = tempChar;
                    }
                }
    
                afterEnd++;
            }
            else
            {
                do
                {
                    afterEnd++;
                } while ((afterEnd < sb.Length && (!char.IsLetter(context[afterEnd]))));
            }
        }
            return sb.ToString();
    }
    
  19. zyf
    114.212.86.*
    链接

    zyf 2012-11-07 23:36:23

    用LINQ写的低效版本:

    public static String MakeTypoglycemia(String text) {
    
        Predicate<Char> matcher = c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
    
        //置所有需打乱字符的原始下标
    
        IList<Int32> shuffleIndexs = Enumerable.Range(1, text.Length - 2).Where(index => matcher(text[index])).ToList();
    
        //所有需打乱字符
        IList<Char> buffer = shuffleIndexs.Select(index => text[index]).ToList();
    
        //打乱字符
        buffer.Shuffle();
    
        StringBuilder sb = new StringBuilder(text);
    
        foreach (var x in shuffleIndexs.Zip(buffer, (index, c) => sb[index] = c)) { }
    
        return sb.ToString();
    }
    
    public static void Shuffle<T>(this IList<T> list)
    {
        var randomNumber = new Random(DateTime.Now.Millisecond);
        var n = list.Count;
        while (n > 1)
        {
            n--;
            var k = randomNumber.Next(n + 1);
            var value = list[k];
            list[k] = list[n];
            list[n] = value;
        }
    }
    
    private Predicate<Char> matcher = c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
    private static Random randomNumber = new Random(DateTime.Now.Millisecond);
    

    只用了一个BitArray的较高效版本:

    public static String MakeTypoglycemia2(String text)
    {
        int swapUpperBound = text.Length - 1;
    
        Predicate<Char> matcher = c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
    
        BitArray indexs = new BitArray(swapUpperBound);
    
        //置所有需打乱字符下标
        foreach (Int32 i in Enumerable.Range(1, swapUpperBound - 1).Where(index => matcher(text[index])))
        {
            indexs.Set(i, true);
        }
    
        StringBuilder sb = new StringBuilder(text);
    
        for (int i = 1; i < swapUpperBound; ++i)
        {
            if (!indexs.Get(i))//指定位置不用交换
            {
                continue;
            }
    
            //i 位置的元素需要交换
    
            //在i后面的位置找一个可交换元素
            int randomIndex = randomNumber.Next(i + 1, swapUpperBound);
            int swapIndex = -1;
            int j = randomIndex;
    
            for (j = randomIndex; j < swapUpperBound; ++j)
            {
                if (indexs.Get(j))
                {
                    swapIndex = j;
                    break;
                }
            }
    
            if (swapIndex == -1)//randomIndex正向到末尾未找到可交换元素
            {
                for (j = i + 1; j < randomIndex; ++j)//从i下一个元素找到randomIndex
                {
                    if (indexs.Get(j))
                    {
                        swapIndex = j;
                        break;
                    }
                }
            }
    
            if (swapIndex == -1)//最终未找到可交换元素
            {
                break;//退出外层循环
            }
            else
            {
                //交换i与swapIndex处元素
                Char temp = sb[i];
                sb[i] = sb[swapIndex];
                sb[swapIndex] = temp;
    
                //置位表示不用再交换
                indexs.Set(i, false);
                indexs.Set(swapIndex, false);
            }
    
        }
    
        return sb.ToString();
    }
    
  20. Nyyrikki
    58.161.130.*
    链接

    Nyyrikki 2012-11-08 08:53:22

    C#

    static string MakeTypoglycemia(string text)
    {
        char[] seq = text.ToCharArray();
        int len = seq.Length;
        var buf = new char[len];
        int word_start = 0;
        var rnd = new Random();
        for (int i = 0; i < len; i++)
        {
            if (!Char.IsLetter(seq[i]))
            {
                int shuffle_start = word_start + 1,
                    k = i - 2;
    
                while (k > shuffle_start)
                {
                    var j = rnd.Next(shuffle_start, k);
                    var t = buf[k];
                    buf[k] = buf[j];
                    buf[j] = t;
                    k--;
                }
                word_start = i + 1;
            }
            buf[i] = seq[i];
        }
        return new String(buf);
    }
    
  21. 木草非人
    112.95.149.*
    链接

    木草非人 2012-11-08 14:06:37

    C#

    private static string amazingWord(string word)
    {
        int length = word.Length;           
    
        if (length > 3)
        {
            char[] wordLetters = word.ToCharArray();
            int count = length - 2;
            while (--count > 0)
            {
                int k = random.Next() % count;
                char temp = wordLetters[k + 1];
                wordLetters[k + 1] = wordLetters[count + 1];
                wordLetters[count + 1] = temp;
            }
            return new string(wordLetters);
        }
        else
        {
            return word;
        }            
    }
    
  22. lixiong
    67.160.42.*
    链接

    lixiong 2012-11-08 14:19:57

    foreach word
      if converrted before, use cached value
      otherwise, do convert and cache.
    
    convert:
      if length<=3, nothing.
      startpos = 1;
      nextSwapPos = findNextSwap(startPos, lastIndex);
      if(nextSwapPos is or exceed last pos), nothing;
      else swap(startPos, nextSwapPos), startPos=nextSwapPos+1;
    
    findNextSwap(pos)
      start=pos+1; 
      begin: 
      if(start>=lastIndex) return pos;
      if(getval(start) != getval(start-1)) return start;
      start++, goto begin
    
  23. wheat
    202.45.129.*
    链接

    wheat 2012-11-08 15:43:41

    来个java版的

    public class Typoglycemia {
    
        private final static int MAX_WORD_LENGTH = 100; 
        private final static Random random = new Random();
    
        public String makeTypoglycemia(String text) {
            if (text == null) throw new IllegalArgumentException("input text cannot be null!");
            if (text.length() < 4) return text;
    
            StringBuilder result = new StringBuilder();
            char[] buffer = new char[MAX_WORD_LENGTH];
            int offset = 0;
            int bufferLength = 0;
    
            while (offset < text.length()) {
                char ch = text.charAt(offset++);
                if (Character.isLetter(ch)) {
                    buffer[bufferLength++] = ch;
                } else {
                    if (bufferLength > 0) {
                        result.append(shuffle(buffer, bufferLength), 0, bufferLength);
                        bufferLength = 0;
                    }
                    result.append(ch);
                }
            }
    
            if (bufferLength > 0) {
                result.append(shuffle(buffer, bufferLength), 0, bufferLength);
            }
    
            return result.toString();
        }
    
        private char[] shuffle(char[] word, int length) {
            if (length < 4) return word;
    
            for (int i = 1; i < length - 2; i++) {
                int rand = i + random.nextInt(length - i - 1);
                if (i != rand) {
                    char tmp = word[i];
                    word[i] = word[rand];
                    word[rand] = tmp;
                }
            }
            return word;
        }
    
        public static void main(String[] args) {
            String text = "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.\r\nAmazing, huh? Yeah and you always thought spelling was important.";
            System.out.println(new Typoglycemia().makeTypoglycemia(text));
        }
    }
    
  24. dante
    218.25.139.*
    链接

    dante 2012-11-08 16:59:01

    用C的写法写的Java......

    贴在Gist上了

  25. UserId.Generate()
    180.173.167.*
    链接

    UserId.Generate() 2012-11-09 20:07:36

    仔细看了看老赵的描述,看来具体的实现函数Shuffle()老赵并不关心具体实现,本题的核心应该是如何部署和调用Shuffle,实现完善的内存分配和缓存机制。 代码实在写得太烂而且没啥想法所以不献丑了……既然老赵不喜欢谈思路,那么思路也免了……

  26. SomeOne
    110.122.126.*
    链接

    SomeOne 2012-11-10 17:17:35

    没装VS,于是用JAVA把最简单的部分实现了,看楼上的意思是需要把Shuffle()的输入输出做缓存吗?

    public class TypoglycemiaMachine {
    
        // 不过,我们假设这段代码会被大量调用,因此希望可以尽可能地效率高些,内存使用少些。
        static private StringBuilder toChaosText(StringBuilder text) {
            return text.reverse(); // 这货不是随机...
        }
    
        static String MakeTypoglycemia(String text) {
            char chCount = 0; // 给单词的字母计数
            char lastCh = 0; // 记录遍历到的最后一个字母
            StringBuilder midstr = new StringBuilder(); // 收录去首尾字母后单词中间的字符串
            StringBuilder result = new StringBuilder();
            for (char ch : text.toCharArray()) {
                if (Character.isLetter(ch)) {
                    switch (++chCount) {
                    default: // 收录上一个字母, fall through
                        midstr.append(lastCh);
                    case 2:
                        lastCh = ch; // 记录最后一个字母
                        break;
                    case 1: // 首字母直接收录进结果
                        result.append(ch);
                    }
                } else {
                    switch (chCount) { // 一路 fall through
                    default:
                        midstr = toChaosText(midstr);
                    case 3:
                        result.append(midstr); // "has","and","but"之类的
                        midstr = new StringBuilder();
                    case 2:
                        result.append(lastCh); // "of","as","or"之类的
                        lastCh = 0;
                    case 1:
                        chCount = 0; // 'I','a'之类的
                    case 0:
                        result.append(ch); // 连续的非字母字符直接收录后跳过
                    }
                }
            }
            return result.toString();
        }
    }
    
  27. lixiong
    67.160.42.*
    链接

    lixiong 2012-11-10 17:48:01

    cao, 怎么都还是随机调换 understand -> utnadresnd return -> rruten 这样谁认识啊 仔细看例子,都是两两兑换,如果重复字符就跳过。老赵的“中间的字母顺序完全打乱也没有关系”就是估计让人上当的。。。

  28. 老赵
    admin
    链接

    老赵 2012-11-10 23:19:33

    @lixiong

    比如“I couldn't believe that I could actually understand” => “I cdnuol't blveiee taht I cluod aulaclty uesdnatnrd”,哪里仅仅是两两交换了。

  29. dante
    113.227.63.*
    链接

    dante 2012-11-12 01:01:43

    @SomeOne

    result那个StringBuilder可以干掉吧,因为toCharArray得到的数组可以直接操作啊。。。

  30. SomeOne
    211.141.232.*
    链接

    SomeOne 2012-11-12 15:58:04

    @dante

    恩看到解答篇了,我这写的完全是反面教材啊

  31. edc杨洋
    218.69.33.*
    链接

    edc杨洋 2012-11-13 10:08:10

    闲来无事又修改了一下,标点,换行都考虑到了,修复了一下上次的代码

    public class Typoglycemia {
    
        String justDoIt(String content) {
    
            String[] strs = content.replaceAll("\\n", "^").replaceAll("\\s+|\\^", "\\|").split("\\|");
    
            for (int i = 0; i < strs.length; i++) {
    
                char[] cs = strs[i].trim().toCharArray();
                if (cs.length <= 2) {
                    strs[i] = String.valueOf(cs);
                    continue;
                }
    
                char beg = cs[0];
                StringBuilder end = new StringBuilder().append(cs[cs.length - 1]);
    
                if (String.valueOf(cs).matches("^\\w+[?!,.:]$"))
                    end.insert(0, cs[cs.length - 2]);
    
                ArrayList<String> al = new ArrayList<String>();
                for (char c : strs[i].replaceAll("^\\w|\\w$|\\w[,:?!.]$", "").toCharArray())
                    al.add(String.valueOf(c));
    
                Collections.shuffle(al);
                Iterator it = al.iterator();
                StringBuilder sb = new StringBuilder();
                while (it.hasNext())
                    sb.append(it.next());
    
                strs[i] = beg + sb.toString() + end;
            }
    
            StringBuilder buf = new StringBuilder();
            for (String str : strs)
                buf.append(str + " ");
    
            return buf.deleteCharAt(buf.length() - 1).toString().replaceAll("\\s\\s","\\\n\\\n");
        }
    
        public static void main(String[] args) {
    
            String content = "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.\n" +
                    "\n" +
                    "Amazing, huh? Yeah and you always thought spelling was important.";
    
            System.out.println(new Typoglycemia().justDoIt(content));
        }
    }
    

    result:

    I clon'udt bleevie taht I cluod auacllty uarsntdned what I was rdeaing: the pnehnmeoal pweor of the human mnid. Ancdocirg to a rasecerh taem at Cgamdbire Usnrietivy, it don'est mtaetr in what odrer the lteetrs in a word are, the olny iporatmnt tihng is taht the frsit and last lteetr be in the rhgit plcae. The rest can be a taotl mses and you can sltil read it wuithot a poblerm. Tihs is bucasee the haumn mnid deos not read eevry letetr by itslef, but the word as a wolhe. Scuh a coiiotdnn is aropeptilarpy cleald Tpylyigcomea.

    Ainzmag, huh? Yaeh and you aaywls tohghut selpinlg was iropntamt.

  32. CJC
    134.76.81.*
    链接

    CJC 2013-12-31 14:07:13

    能不能尝试把一个1 - (n-1)的数组randomize一下,然后用那个乱序数组排列原来的char[]?

  33. 链接

    张秦 2015-04-23 10:33:06

    string MakeTypoglycemia(string text)
    {
            return Regex.Replace(text, "\\w{4,}", (x) =>
            {
                return CompleteWord(x.Value);    
            });
        }
        string CompleteWord(string word)
        {
            Random r = new Random();
            char[] wc = word.ToArray();
            char temp;
            int source;
            int target;
            for (int i = 0; i < wc.Length - 2; i++)
            {
                source = r.Next(1, wc.Length - 1);
                target = r.Next(1, wc.Length - 1);
                temp = wc[source];
                wc[source] = wc[target];
                wc[target] = temp;
            }
            return new string(wc);
        }
    

    思路:正则匹配非空4位以上的单词(如果收尾不变则4位以上的单词才需要乱序)进行replace,replace取中间字符(去掉头尾)随机打乱。

发表回复

登录 / 登录并记住我 ,登陆后便可删除或修改已发表的评论 (请注意保留评论内容)

昵称:(必填)

邮箱:(必填,仅用于Gavatar

主页:(可选)

评论内容(大于5个字符):

  1. Your Name yyyy-MM-dd HH:mm:ss

使用Live Messenger联系我