Chapter 9 Case study: word play 案例学习:单词游戏

This chapter presents the second case study, which involves solving word puzzles by searching for words that have certain properties. For example, we’ll find the longest palindromes in English and search for words whose letters appear in alphabetical order. And I will present another program development plan: reduction to a previously-solved problem.


9.1 Reading word lists 读取字符列表

For the exercises in this chapter we need a list of English words. There are lots of word lists available on the Web, but the one most suitable for our purpose is one of the word lists collected and contributed to the public domain by Grady Ward as part of the Moby lexicon project (see Here). It is a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games. In the Moby collection, the filename is 113809of.fic; you can download a copy, with the simpler name words.txt, from Here.

本章练习中,咱们需要用一个英语词汇列表。网上有很多,不过最适合我们的列表并且是共有领域的,莫过于 Grady Ward这份词汇表,这是Moby词典计划的一部分(点击此链接访问详情)。这是一份113,809个公认的字谜表;也就是公认可以用于字谜游戏以及其他文字游戏的单词。在 Moby 的词汇项目中,该词表的文件名为113809of.fic;你可以下载一份副本,这里名字简化成 words.txt 了,下载地址在这里

This file is in plain text, so you can open it with a text editor, but you can also read it from Python. The built-in function open takes the name of the file as a parameter and returns a file object you can use to read the file.

这个文件就是纯文本,所以你可以用文本编辑器打开一下,不过也可以用 Python 来读取。Python 内置了一个叫open 的函数,接收文件名做参数,返回一个文件对象,你可以用它来读取文件。

>>> fin = open('words.txt')
>>> fin = open('words.txt')

fin is a common name for a file object used for input. The file object provides several methods for reading, including readline, which reads characters from the file until it gets to a newline and returns the result as a string:

fin 是一个用来表示输入的文件的常用名字。这个文件对象提供了好几种读取的方法,包括逐行读取,这种方法是读取文本中的一整行直到结尾,然后把读取的内容作为字符串返回:

>>> fin.readline()
>>> fin.readline()

The first word in this particular list is “aa”, which is a kind of lava. The sequence \r\n represents two whitespace characters, a carriage return and a newline, that separate this word from the next. The file object keeps track of where it is in the file, so if you call readline again, you get the next word:

这一列当中的第一个词就是『aa』了,这是一种熔岩(译者注:“aa”是夏威夷词汇,音“阿阿”,用来描述表面粗糙的熔岩流。译者本人就是地学专业的,都很少接触这个词,本教材作者真博学啊)。后面跟着的\r\n 的意思代表着有两个转义字符,一个是回车,一个是换行,这样把这个单词从下一个单词分隔开来。

文件对象会记录这个单词在源文件中的位置,所以下次你再调用 readline 的时候,就能得到下一个词了:

>>> fin.readline()
>>> fin.readline()

The next word is “aah”, which is a perfectly legitimate word, so stop looking at me like that. Or, if it’s the whitespace that’s bothering you, we can get rid of it with the string method strip:

下一个词是『aah』,这完全是一个正规的词汇,不要怪异眼神看我哦。另外如果转义字符让你很烦,咱们可以稍加修改来去掉它,用字符串的 strip 方法即可:

>>> line = fin.readline()
>>> line = fin.readline()
>>> word = line.strip()
>>> word = line.strip()
>>> word
>>> word

You can also use a file object as part of a for loop. This program reads words.txt and prints each word, one per line:

在 for 循环中也可以使用文件对象。下面的这个程序读取整个 words.txt 文件,然后每行输出一个词:

fin = open('words.txt')
    for line in fin:
        word = line.strip()

9.2 Exercises 练习

There are solutions to these exercises in the next section. You should at least attempt each one before you read the solutions.


Exercise 1 练习1

Write a program that reads words.txt and prints only the words with more than 20 characters (not counting whitespace).

写一个程序读取 words.txt,然后只输出超过20个字母长度的词(这个长度不包括转义字符)。

Exercise 2 练习2

In 1939 Ernest Vincent Wright published a 50,000 word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do.

在1939年,作家厄尔尼斯特·文森特·莱特曾经写过一篇5万字的小说《葛士比》,里面没有一个字母e。因为在英语中 e 是用的次数最多的字母,所以这很不容易的。

In fact, it is difficult to construct a solitary thought without using that most common symbol. It is slow going at first, but with caution and hours of training you can gradually gain facility.


All right, I’ll stop now. Write a function called has_no_e that returns True if the given word doesn’t have the letter “e” in it.


写一个名字叫做 has_no_e 的函数,如果给定词汇不含有 e 就返回真,否则为假。

Modify your program from the previous section to print only the words that have no “e” and compute the percentage of the words in the list that have no “e”.

修改一下上一节的程序代码,让它只打印单词表中没有 e 的词汇,并且统计一下这些词汇在总数中的百分比例。

Exercise 3 练习3

Write a function named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters. Modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them. Can you find a combination of 5 forbidden letters that excludes the smallest number of words?

写一个名叫 avoids 的函数,接收一个单词和一个禁用字母组合的字符串,如果单词不含有该字符串中的任何字母,就返回真。


Exercise 4 练习4

Write a function named uses_only that takes a word and a string of letters, and that returns True if the word contains only letters in the list. Can you make a sentence using only the letters acefhlo? Other than “Hoe alfalfa”?

写一个名叫uses_only的函数,接收一个单词和一个字母字符串,如果单词仅包含该字符串中的字母,就返回真。你能仅仅用 acefhlo 这几个字母造句子么?或者试试『Hoe alfalfa』?

Exercise 5 练习5

Write a function named uses_all that takes a word and a string of required letters, and that returns True if the word uses all the required letters at least once. How many words are there that use all the vowels aeiou? How about aeiouy?

写一个名字叫uses_all的函数,接收一个单词和一个必需字母组合的字符串,如果单词对必需字母组合中的字母至少都用了一次就返回真。有多少单词都用到了所有的元音字母 aeiou?aeiouy的呢?

Exercise 6 练习6

Write a function called is_abecedarian that returns True if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?


9.3 Search 搜索

All of the exercises in the previous section have something in common; they can be solved with the search pattern we saw in Section 8.6. The simplest example is:


def has_no_e(word):
    for letter in word:
        if letter == 'e':
            return False
    return True

The for loop traverses the characters in word. If we find the letter “e”, we can immediately return False; otherwise we have to go to the next letter. If we exit the loop normally, that means we didn’t find an “e”, so we return True.

这个 for 循环遍历了单词的所有字母。如果找到了字母e,就立即返回假;否则就到下一个字母。如果正常退出了循环,意味着我们没找到 e,就返回真。

You could write this function more concisely using the in operator, but I started with this version because it demonstrates the logic of the search pattern. avoids is a more general version of has_no_e but it has the same structure:

你可以使用 in 运算符,把这个函数写得更精简,我之所以用一个稍显麻烦的版本,是为了说明搜索模式的逻辑过程。

avoids 是一个更通用版本的has_no_e函数的实现,它的结构是一样的:

def avoids(word, forbidden):
    for letter in word:
        if letter in forbidden:
            return False
    return True

We can return False as soon as we find a forbidden letter; if we get to the end of the loop, we return True. uses_only is similar except that the sense of the condition is reversed:



def uses_only(word, available):
    for letter in word:
        if letter not in available:
            return False
    return True

Instead of a list of forbidden letters, we have a list of available letters. If we find a letter in word that is not in available, we can return False. uses_all is similar except that we reverse the role of the word and the string of letters:



def uses_all(word, required):
    for letter in required:
        if letter not in word:
            return False
    return True

Instead of traversing the letters in word, the loop traverses the required letters. If any of the required letters do not appear in the word, we can return False. If you were really thinking like a computer scientist, you would have recognized that uses_all was an instance of a previously-solved problem, and you would have written:


def uses_all(word, required):
    return uses_only(required, word)

This is an example of a program development plan called reduction to a previously-solved problem, which means that you recognize the problem you are working on as an instance of a solved problem and apply an existing solution.


9.4 Looping with indices 带索引循环

I wrote the functions in the previous section with for loops because I only needed the characters in the strings; I didn’t have to do anything with the indices. For is_abecedarian we have to compare adjacent letters, which is a little tricky with a for loop:

上面的章节中我写了各种用 for 循环的函数,因为当时只需要字符串中的字符;这就不需要理会索引。

但is_abecedarian这个函数中,我们需要对比临近的两个字母,所以用 for 循环就不那么好写了:

def is_abecedarian(word):
    previous = word[0]
    for c in word:
        if c < previous:
            return False
        previous = c
    return True

An alternative is to use recursion:


def is_abecedarian(word):
    if len(word) <= 1:
        return True
    if word[0] > word[1]:
        return False
    return is_abecedarian(word[1:])

Another option is to use a while loop:

另外一种方法是用 while 循环:

def is_abecedarian(word):
    i = 0
    while i < len(word)-1:
        if word[i+1] < word[i]:
        return False
        i = i+1
    return True

The loop starts at i=0 and ends when i=len(word)-1. Each time through the loop, it compares the ith character (which you can think of as the current character) to the i+1th character (which you can think of as the next).

循环开始于 i 等于0,然后在 i 等于len(word)-1的时候结束。每次通过循环的时候,都对比第 i 个字符(你可以就当是当前字符)与第 i+1个字符(就当作下一个字符)。

If the next character is less than (alphabetically before) the current one, then we have discovered a break in the abecedarian trend, and we return False.


If we get to the end of the loop without finding a fault, then the word passes the test. To convince yourself that the loop ends correctly, consider an example like 'flossy'. The length of the word is 6, so the last time the loop runs is when i is 4, which is the index of the second-to-last character. On the last iteration, it compares the second-to-last character to the last, which is what we want.

如果一直到循环结尾都没有发现问题,这个词就通过检验了。为了确信循环正确结束了,可以拿单词『flossy』作为例子来试试。单词长度是6,所以循环终止的时候 i 应该是4,也就是倒数第二个位置。在最后一次循环中,比较的是倒数第二个和最后一个字母,这正是符合我们设计的。

Here is a version of is_palindrome (see Exercise 3) that uses two indices; one starts at the beginning and goes up; the other starts at the end and goes down.


def is_palindrome(word):
    i = 0
    j = len(word)-1
    while i<j:
        if word[i] != word[j]:
            return False
        i = i+1
        j = j-1
    return True

Or we could reduce to a previously-solved problem and write:


def is_palindrome(word):
    return is_reverse(word, word)

Using is_reverse from Section 8.11.


9.5 Debugging 调试

Testing programs is hard. The functions in this chapter are relatively easy to test because you can check the results by hand. Even so, it is somewhere between difficult and impossible to choose a set of words that test for all possible errors.


Taking has_no_e as an example, there are two obvious cases to check: words that have an ‘e’ should return False, and words that don’t should return True. You should have no trouble coming up with one of each.

比如以has_no_e为例,有两种情况用来检查:有 e 的单词应该返回假,不包含 e 的单词要返回真。你自己想出几个这样的单词来检验一下并不难。

Within each case, there are some less obvious subcases. Among the words that have an “e”, you should test words with an “e” at the beginning, the end, and somewhere in the middle. You should test long words, short words, and very short words, like the empty string. The empty string is an example of a special case, which is one of the non-obvious cases where errors often lurk.

在每个分支内,有一些不那么清晰的次级分支。在那些有 e 的单词中,你还要检测单词中的 e 是在开头结尾还是中间位置。你得试试长词、短词,甚至特别短的词,比如空字符串。空字符串是一个典型特例,这个情况很容易被忽视而成为潜伏的隐患。

(译者注:我知道,这段翻译的简直就是 shit,但是没办法,我这会眼睛特别疼,思路不太清楚,另外这几个练习也不是很难,大家很容易自己搞定。)

In addition to the test cases you generate, you can also test your program with a word list like words.txt. By scanning the output, you might be able to catch errors, but be careful: you might catch one kind of error (words that should not be included, but are) and not another (words that should be included, but aren’t).

除了你自己设计的测试案例之外,你也可以用一个单词列表比如 words.txt 之类的来测试一下你的程序。通过扫描一下输出内容,你也许能够发现错误的地方,但一定要小心:你有可能发现某一种特定错误,但忽略了另外一个,比如包含了不应该包含的单词,但很难发现应该包含但遗漏了单词的情况。

In general, testing can help you find bugs, but it is not easy to generate a good set of test cases, and even if you do, you can’t be sure your program is correct. According to a legendary computer scientist: Program testing can be used to show the presence of bugs, but never to show their absence! — Edsger W. Dijkstra


程序测试可以用来表明 bug 的存在,但永远不能表明 bug 不存在。

— Edsger W. Dijkstra

9.6 Glossary 术语列表

file object: A value that represents an open file.


reduction to a previously-solved problem: A way of solving a problem by expressing it as an instance of a previously-solved problem.


special case: A test case that is a typical or non-obvious (and less likely to be handled correctly).


9.7 Exercises 练习

Exercise 7 练习7

This questionis based on a Puzzler that was broadcast on the radio program Car Talk : Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the ‘i’ that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word? Write a program to find it. Solution

这个问题基于一个谜语,这个谜语在广播节目 Car Talk 上面播放过:

给我一个有三个连续双字母的单词。我会给你一对基本符合的单词,但并不符合。例如, committee 这个单词,C O M M I T E。如果不是有单独的一个 i 在里面,就基本完美了,或者Mississippi 这个词:M I S I S I P I。如果把这些个 i 都去掉就好了。但有一个词正好是三个重叠字母,而且据我所知这个词可能是唯一一个这样的词。当然有有可能这种单词有五百多个呢,但我只能想到一个。是哪个词呢?写个程序来找一下这个词吧。


Exercise 8 练习8

Here’s another Car Talk Puzzler : “I was driving on the highway the other day and I happened to notice my odometer. Like most odometers, it shows six digits, in whole miles only. So, if my car had 300,000 miles, for example, I’d see 3-0-0-0-0-0. “Now, what I saw that day was very interesting. I noticed that the last 4 digits were palindromic; that is, they read the same forward as backward. For example, 5-4-4-5 is a palindrome, so my odometer could have read 3-1-5-4-4-5. “One mile later, the last 5 numbers were palindromic. For example, it could have read 3-6-5-4-5-6. One mile after that, the middle 4 out of 6 numbers were palindromic. And you ready for this? One mile later, all 6 were palindromic! “The question is, what was on the odometer when I first looked?” Write a Python program that tests all the six-digit numbers and prints any numbers that satisfy these requirements. Solution

这个又是一个 Car Talk 谜语:





写个 Python 程序来检测一下所有的六位数,然后输出一下满足这些要求的数字。 样例代码

Exercise 9 练习9

Here’s another Car Talk Puzzler you can solve with a search : “Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other topics and we never came up with an answer. “When I got home I figured out that the digits of our ages have been reversible six times so far. I also figured out that if we’re lucky it would happen again in a few years, and if we’re really lucky it would happen one more time after that. In other words, it would have happened 8 times over all. So the question is, how old am I now?” Write a Python program that searches for solutions to this Puzzler. Hint: you might find the string method zfill useful. Solution

这个又是一个 Car Talk 谜语,你可以用搜索来解决:



写一个 Python 程序,搜索一下这个谜语的解。提示一下:你可能发现字符串的 zfill 方法很有用哦。


powered by Gitbook 该教程制作时间: 2016-03-25 13:29:01