第八章-处理旧版本代码


在本章我们将讨论一下话题:

  • 阅读Django代码
  • 探索相关文档
  • 增量变更还是完全重写
  • 改变代码之前编写测试
  • 旧版本数据库的集成

It sounds exciting when you are asked to join a project. Powerful new tools and cutting-edge technologies might await you. However, quite often, you are asked to work with an existing, possibly ancient, codebase.

加入项目时你可能很感兴趣。

To be fair, Django has not been around for that long. However, projects written for older versions of Django are sufficiently different to cause concern. 􏰆o􏰀eti􏰀es, having the entire source code and documentation might not be enough.
公平起见,Django并没有一直这样做。不过,项目

􏰇Yeah,􏰇 said Brad, 􏰇Where is Hart?􏰇 Mada􏰀 􏰚 hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked, was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline."

There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this." If you are asked to recreate the environment, then you might need to fumble with the 􏰚􏰆 configuration, database settings, and running services locally or on the network. There are so many pieces to this puzzle that you might wonder how and where to start.

Understanding the Django version used in the code is a key piece of information. As Django evolved, everything from the default project structure to the recommended best practices have changed. Therefore, identifying which version of Django was used is a vital piece in understanding it.

理解Django的版本再代码中的使用是很关键的信息。随着Django的进化,所有来自默认项目结构的建议最佳实践都发生了变化。因此,认出Django在使用的是哪一个版本是理解这个框架中重要的一环。

Change of Guards

Sitting patiently on the ridiculously short beanbags in the training room, the SuperBook team waited for Hart. He had convened an emergency go-live meeting. Nobody understood the "emergency" part since go live was at least 3 months away.

Madam O rushed in holding a large designer coffee mug in one hand and a bunch of printouts of what looked like project timelines in the other. Without looking up she said, "We are late so I will get straight to the point. In the light of last week's attacks, the board has decided to summarily expedite the SuperBook project and has set the deadline to end of next 􏰀onth. 􏰅ny questions?􏰇

Yeah,􏰇 said Brad, 􏰇Where is Hart?􏰇 Mada􏰀 􏰚 hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked, was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline."

There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this."

"That's correct," said Evan from the far end of the room. "Nearly there," he smiled at others, as they shifted focus to him. Madam O peered above the rim of her glasses and smiled almost too politely. "Considering that we already have an extremely well-tested and working Archiver in our Sentinel code base, I would recommend that you leverage that instead of creating another redundant system."

"But," Steve interrupted, "it is hardly redundant. We can improve over a legacy archiver, can't we?􏰇 􏰇If it isn't broken, then don't fix it􏰇, replied Madam O tersely. He said, "He is working on it," said Brad almost shouting, 􏰇What about all that work he has already finished?􏰇

􏰇􏰍van, how 􏰀uch of the work have you co􏰀pleted so far?􏰇 asked 􏰚, rather impatiently. "About 12 percent," he replied looking defensive. 􏰍veryone looked at hi􏰀 incredulously. 􏰇What? That was the hardest 􏰛􏰜 percent" he added.

O continued the rest of the meeting in the same pattern. Everybody's work was reprioriti􏰄ed and shoe􏰃horned to fit the new deadline. 􏰅s she picked up her papers, readying to leave she paused and removed her glasses.

"I know what all of you are thinking... literally. But you need to know that we had no choice about the deadline. All I can tell you now is that the world is counting on you to meet that date, somehow or other." Putting her glasses back on, she left the room.

"I a􏰀m definitely going to bring 􏰀my tinfoil hat,􏰇" said E􏰍van loudly to himself.

查找Django的版本信息

理论上,每个项目在根目录都拥有一个requirements.txt文件,或者 一个setup.py文件,这个文件将表明Django再项目中要使用的是哪一个版本。让我们来看看于此相关的文件片段:

Django==1.5.9

注意,版本数字已经完全提过了的称作链(相对于Django>=1.5.9)。对每一个包都进行链接是一个好习惯,因为它能够减少人们的疑问,让你更为确定版本信息。

Unfortunately, there are real-world codebases where the requirements.txt file was not updated or even completely missing. In such cases, you will need to probe for various tell􏰃tale signs to find out the exact version.

不幸的是,

激活虚拟环境

In most cases, a Django project would be deployed within a virtual environment. Once you locate the virtual environment for the project, you can activate it by jumping to that directory and running the activated script for your OS. For Linux, the command is as follows:

再很多情况下,Django的项目会部署在一个虚拟环境中。一旦你找出了项目的虚拟环境,你可以通过跳过这个目录,并为系统运行激活的脚本。对于Linux来说,可以使用如下命令:

$ source venv_path/bin/activate

Once the virtual environment is active, start a Python shell and query the Django version as follows:

只要虚拟环境一激活,你就可以启动Python终端,然后像我这样查询Django的版本:

$ python
>>> import django
>>> print(django.get_version())
1.5.9

The Django version used in this case is Version 1.5.9.

本例中使用的Django版本是 1.5.9.

Alternatively, you can run the manage.py script in the project to get a similar output:

可选择的是,你可以在项目中运行脚本 manage.py 以获取类似的输出内容:

$ python manage.py --version
1.5.9

However, this option would not be available if the legacy project source snapshot was sent to you in an undeployed form. If the virtual environment (and packages) was also included, then you can easily locate the version number (in the form of a tuple) in the __init__.py file of the Django directory. For exa􏰀mple􏰂:

不过呢,要是在未部署的情况下,之前遗留的项目源码镜像被发送给你了,那么这个选项是不可用的。如果虚拟环境(以及包)被包括在内,那么你在Django目录中的__init__.py文件简单地找到版本号(以元组地形式出现)。例如:

$ cd envs/foo_env/lib/python2.7/site-packages/django
$ cat __init__.py
VERSION = (1, 5, 9, 'final', 0)
...

If all these methods fail, then you will need to go through the release notes of the past Django versions to deter􏰀ine the identifiable changes 􏰋form exa􏰀ple, the AUTH_PROFILE_MODULE setting was deprecated since Version 1.5) and match them to your legacy code. Once you pinpoint the correct Django version, then you can move on to analyzing the code.

如果,所有这些方法都不管用,那么你需要

文件都放在哪里了?这可不是PHP啊

最大的一个困难点是用到的场合,特别是如果你来自PHP或者APS.NET的世界,即,源文件并不位于web服务器的文档根目录,而目录却通常命名为wwwroot或者public_html。此外,代码的目录结构和网站的URL结构之间并没有直接的关系。

In fact, you will find that your Django website's source code is stored in an obscure path such as /opt/webapps/my-django-app. Why is this? 􏰅􏰀ong 􏰀any good reasons, it is often 􏰀ore secure to 􏰀ove your confidential data outside your public webroot. This way, a web crawler would not be able to accidentally stumble into your source code directory.

实际上,你会发现自己的Django站点源码被存储在了一个使人难以理解的的路径中,比如,/opt/webapps/my-django-app。为什么会这样?

As you would read in the Chapter 11, Production-ready the location of the source code can be found by exa􏰀ining your web server's configuration file. Here, you will find either the environment variable DJANGO_SETTINGS_MODULE being set to the module's path, or it will pass on the request to a W􏰆􏰢I server that will be configured to point to your project.wsgi file.
和你之前在第十一章读到的那样,

Starting with urls.py

Even if you have access to the entire source code of a Django site, figuring out how it works across various apps can be daunting. It is often best to start from the root urls.py URLconf file since it is literally a 􏰀ap that ties every request to the respective views.

即使你访问了整个Django站点的源代码,要搞清楚url如何与多个应用交互是令人畏惧的。

With normal Python programs, I often start reading from the start of its execution—say, from the top-level main module or wherever the main check idiom starts. In the case of Django applications, I usually start with urls.py since it is easier to follow the 􏰁ow of execution based on various URL patterns a site has.

In Linux, you can use the following find command to locate the settings.py file and the corresponding line specifying the root urls.py:

$ find . -iname settings.py -exec grep -H 'ROOT_URLCONF' {} \;
./projectname/settings.py:ROOT_URLCONF = 'projectname.urls'
$ ls projectname/urls.py
projectname/urls.py

Jumping around the code Reading code sometimes feels like browsing the web without the hyperlinks. When you encounter a function or variable defined elsewhere, then you will need to ju􏰀p to the file that contains that definition. 􏰆o􏰀e ID􏰍s can do this auto􏰀atically for you as long as you tell it which files to track as part of the project.

If you use 􏰍􏰀Emacs or 􏰊Vim􏰀 instead, then you can create a T􏰅􏰢􏰆AGS file to quickly navigate between files. 􏰢Go to the project root and run a tool called Exuberant Ctags as follows:

如果你使用的是Emacs或者Vim,那么你可以创建一个TAG文件来快速地在多个文件之间进行浏览。如下,切换目录到根目录,然后运行叫做Exuberant Ctags的工具:

find . -iname "*.py" -print | etags -

This creates a file called T􏰅􏰢􏰆AGS that contains the location infor􏰀ation, where every syntactic unit such as classes and functions are defined. In 􏰍􏰀Emacs, you can find the definition of the tag, where your cursor 􏰋or point as it called in 􏰍􏰀acs􏰌 is at using the M-. command.

While using a tag file is extre􏰀ely fast for large code bases, it is quite basic and is not aware of a virtual environ􏰀ent 􏰋where 􏰀ost definitions 􏰀ight be located􏰌. 􏰅n excellent alternative is to use the elpy package in 􏰍􏰀acs. It can be configured to detect a virtual environ􏰀ent. 􏰈u􏰀ping to a definition of a syntactic ele􏰀ent is using the same M-. co􏰀􏰀and. However, the search is not restricted to the tag file. 􏰆o, you can even ju􏰀p to a class definition within the Django source code sea􏰀lessly.

Understanding the code base

It is quite rare to find legacy code with good documentation. Even if you do, the documentation might be out of sync with the code in subtle ways that can lead to further issues. Often, the best guide to understand the application's functionality is the executable test cases and the code itself.

The official Django docu􏰀entation has been organi􏰄ed by versions at https://docs. djangoproject.com. On any page, you can quickly switch to the corresponding page in the previous versions of Django with a selector on the bottom right-hand section of the page:

图片:略

In the same way, documentation for any Django package hosted on readthedocs. org can also be traced back to its previous versions. For example, you can select the documentation of django-braces all the way back to v1.0.0 by clicking on the selector on the bottom left-hand section of the page:

图片:略

Creating the big picture 描绘宏伟蓝图

Most people find it easier to understand an application if you show them a high-level diagram. While this is ideally created by someone who understands the workings of the application, there are tools that can create very helpful high-level depiction of a Django application.

很多人发现如果你对他们展示一个高级图表,那么他们会觉得更容易理解一个应用。而这个图表,理论上是由那些理解应用工作流程的人所创建,有很多工具可以创建非常富有帮助的对Django应用的高级描述。

A graphical overview of all models in your apps can be generated by the graph_models management command, which is provided by the django-command-extensions package. As shown in the following diagram, the model classes and their relationships can be understood at a glance:

应用里的全部模型的图形化的概览都可以通过管理命令graph_models生成,它通过包django-command-extensions实现。如下图所示,模型类以及这些模型之间的关系都可以一目了然:

图片:略

Model classes used in the SuperBook project connected by arrows indicating their relationships

This visualization is actually created using PyGraphviz. This can get really large for projects of even medium complexity. Hence, it might be easier if the applications are logically grouped and visualized separately.

实际上,可视化是使用PyGraphviz创建的。这张图可以变得很大,即使面对的时中型的复杂项目。因此,如果应用在逻辑上组织一起,而在视觉上独立的,那么也能够让人们的理解轻松些。

PyGraphviz Installation and Usage

If you find the installation of Py􏰢raphvi􏰄 challenging, then don't worry, you are not alone. Recently, I faced numerous issues while installing on Ubuntu, starting from Python 3 incompatibility to incomplete documentation. To save your time, I have listed the steps that worked for me to reach a working setup.

On Ubuntu, you will need the following packages installed to install PyGraphviz:

$ sudo apt-get install python3.4-dev graphviz libgraphviz-dev pkg-config

Now activate your virtual environment and run pip to install the development version of PyGraphviz directly from GitHub, which supports Python 3:

$ pip install git+http://github.com/pygraphviz/pygraphviz.git#egg=pygraphviz

Next, install django-extensions and add it to your INSTALLED_ APPS. Now, you are all set.

Here is a sa􏰀ple usage to create a 􏰢raph􏰊i􏰄 dot file for just two apps and to convert it to a PNG image for viewing:

$ python manage.py graph_models app1 app2 > models.dot
$ dot -Tpng models.dot -o models.png

增量变更还是完全重写?

Often, you would be handed over legacy code by the application owners in the earnest hope that most of it can be used right away or after a couple of minor tweaks. However, reading and understanding a huge and often outdated code base is not an easy job. Unsurprisingly, most programmers prefer to work on greenfield develop􏰀ent.

In the best case, the legacy code ought to be easily testable, well documented, and 􏰁exible to work in 􏰀odern environ􏰀ents so that you can start 􏰀aking incremental changes in no time. In the worst case, you might recommend discarding the existing code and go for a full rewrite. Or, as it is commonly decided, the short-term approach would be to keep making incremental changes, and a parallel long-term effort might be underway for a complete reimplementation.

A general rule of thumb to follow while taking such decisions is—if the cost of rewriting the application and maintaining the application is lower than the cost of maintaining the old application over time, then it is recommended to go for a rewrite. Care must be taken to account for all the factors, such as time taken to get new programmers up to speed, the cost of maintaining outdated hardware, and so on.

Sometimes, the complexity of the application domain becomes a huge barrier against a rewrite, since a lot of knowledge learnt in the process of building the older code gets lost. Often, this dependency on the legacy code is a sign of poor design in the application like failing to externalize the business rules from the application logic.

The worst form of a rewrite you can probably undertake is a conversion, or a mechanical translation from one language to another without taking any advantage of the existing best practices. In other words, you lost the opportunity to modernize the code base by removing years of cruft.

Code should be seen as a liability not an asset. As counter-intuitive as it might sound, if you can achieve your business goals with a lesser amount of code, you have dramatically increased your productivity. Having less code to test, debug, and maintain can not only reduce ongoing costs but also make your organization 􏰀ore agile and 􏰁exible to change.

Code is a liability not an asset. Less code is more maintainable.

Irrespective of whether you are adding features or trimming your code, you must not touch your working legacy code without tests in place.

做出任何的改变之前都应该做测试

In the book Working Effectively with Legacy Code, Michael Feathers defines legacy code as, simply, code without tests. He elaborates that with tests one can easily 􏰀odify the behavior of the code quickly and verifiably. In the absence of tests, it is impossible to gauge if the change made the code better or worse.

􏰚ften, we do not know enough about legacy code to confidently write a test. Michael recommends writing tests that preserve and document the existing behavior, which are called characterization tests.

Unlike the usual approach of writing tests, while writing a characterization test, you will first write a failing test with a du􏰀􏰀y output, say X, because you don't know what to expect. When the test harness fails with an error, such as "Expected output X but got Y", then you will change your test to expect Y. So, now the test will pass, and it becomes a record of the code's existing behavior.

Note that we might record buggy behavior as well. After all, this is unfamiliar code. Nevertheless, writing such tests are necessary before we start changing the code. Later, when we know the specifications and code better, we can fix these bugs and update our tests (not necessarily in that order).

编写测试的具体步骤

Writing tests before changing the code is similar to erecting scaffoldings before the restoration of an old building. It provides a structural framework that helps you confidently undertake repairs.

You 􏰀ight want to approach this process in a stepwise 􏰀anner as follows􏰂:

  1. Identify the area you need to make changes to. Write characterization tests focusing on this area until you have satisfactorily captured its behavior.

  2. Look at the changes you need to 􏰀ake and write specific test cases for those. Prefer smaller unit tests to larger and slower integration tests.

  3. Introduce incremental changes and test in lockstep. If tests break, then try to analyze whether it was expected. Don't be afraid to break even the characterization tests if that behavior is something that was intended to change.

If you have a good set of tests around your code, then you can quickly find the effect of changing your code.

换句话来说,如果你决定通过丢掉自己的代码而不是数据来重写,那么Django对于此事是颇有帮助的。

旧版本的数据库

There is an entire section on legacy databases in Django documentation and rightly so, as you will run into them many times. Data is more important than code, and databases are the repositories of data in most enterprises.

You can 􏰀oderni􏰄e a legacy application written in other languages or frameworks by importing their database structure into Django. As an immediate advantage, you can use the Django admin interface to view and change your legacy data.

Django makes this easy with the inspectdb management command, which looks as follows:

$ python manage.py inspectdb > models.py

当你的设置文件使用旧版本的数据库配置过了,这个命令可以自动地生成应用到模型文件的Python代码。

如果你正在把该方法集成到就旧版本数据库,这里给你一些最佳实践建议:

  • 预先了解Django ORM的限制。目前,多个列(合成)主键和非关系型数据库是不支持的。
  • 不要要记手动清理生成的模型,例如,移除Django自动创建的ID冗余字段。
  • 外键关系可能必须手工定义。在某些数据库中自动生成的模型会包含使用_id作为前缀的整数字段。
  • 将模型组织到独立到应用中。之后,在对应到文件夹中就可以轻松的添加视图,表单和测试了。
  • 记住在旧版本的数据库中运行迁移命令将创建Django的管理表(django_*auth_*)。

理想的情况中,自动创建的模型会立即运行起来的,不过在实际情况中,它的运行带来的是很多的尝试和错误。有时候,Django推断的数据类型并不合乎你的期望。另外的情况是,你想要对模型添加unique_together这样对元信息。

Eventually, you should be able to see all the data that was locked inside that aging PHP application in your familiar Django admin interface. I am sure this will bring a smile to your face.

总结

In this chapter, we looked at various techniques to understand legacy code. Reading code is often an underrated skill. But rather than reinventing the wheel, we need to judiciously reuse good working code whenever possible. In this chapter and the rest of the book, we emphasize the importance of writing test cases as an integral part of coding.

本章,我们浏览了多种技术以理解旧版本的代码。阅读代码是一个经常被低估的技能。不过,相比较于重复发明轮子,我们需要决断重复使用。本书的剩下章节,我们强调的是编写测试案例作为代码完整性的一部分。

In the next chapter, we will talk about writing test cases and the often frustrating task of debugging that follows.

下一章,我们要谈论编写测试用例,以及接下来的经常让人沮丧的调试任务。

powered by Gitbook该教程制作时间: 2016-02-04 17:40:02