mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-24 08:32:02 +00:00
fixes
This commit is contained in:
parent
d3e06e6cfd
commit
1f359b080e
@ -1,6 +1,6 @@
|
||||
---
|
||||
title: 'The Tests Are Passing, Why Would I Read The Diff Again?'
|
||||
image: 'https://blog-images.clickhouse.tech/en/2021/code-review/normie-duck.jpg'
|
||||
image: 'https://blog-images.clickhouse.tech/en/2021/code-review/two-ducks.jpg'
|
||||
date: '2021-04-14'
|
||||
author: '[Alexander Kuzmenkov](https://github.com/akuzm)'
|
||||
tags: ['code review', 'development']
|
||||
@ -18,9 +18,9 @@ The correct understanding is also important when modifying and extending softwar
|
||||
|
||||
How can we make our software easier to understand? It is often said that to see if you really understand something, you have to try explaining it to somebody. For example, as a science student taking an exam, you might be expected to give an explanation to some well-known observed effect, deriving it from the basic laws of this domain. In a similar way, if we are modeling some problem in software, we can start from domain knowledge and general programming knowledge, and build an argument as to why our model is applicable to the problem, why it is correct, has optimal performance and so on. This explanation takes the form of code comments, or, at a higher level, design documents.
|
||||
|
||||
If you have a habit of thoroughly commenting your code, you might have noticed that writing the comments is often much harder than writing the code itself. It also has an unpleasant side effect -- at times, while writing a comment, it becomes increasingly clear to you that the code is incomprehensible and takes forever to explain, or maybe is downright wrong, and you have to rewrite it. This is exactly the major positive effect of writing the comments. It helps you find bugs and make the code more understandable, and you wouldn't have noticed these problems unless you tried to explain the code.
|
||||
If you have a habit of thoroughly commenting your code, you might have noticed that writing the comments is often much harder than writing the code itself. It also has an unpleasant side effect — at times, while writing a comment, it becomes increasingly clear to you that the code is incomprehensible and takes forever to explain, or maybe is downright wrong, and you have to rewrite it. This is exactly the major positive effect of writing the comments. It helps you find bugs and make the code more understandable, and you wouldn't have noticed these problems unless you tried to explain the code.
|
||||
|
||||
Understanding why your program works is inseparable from understanding why it fails, so it's no surprise that there is a similar process for the latter, called "rubber duck debugging". To debug a particularly nasty bug, you start explaining the program logic step by step to an imaginary partner or even to an inanimate object such as a yellow rubber duck. This process is often very effective, much in excess of what one would expect given the limited conversational abilities of rubber ducks. The underlying mechanism is probably the same as with comments &emdash; you start to understand your program better by just trying to explain it, and this lets you find bugs.
|
||||
Understanding why your program works is inseparable from understanding why it fails, so it's no surprise that there is a similar process for the latter, called "rubber duck debugging". To debug a particularly nasty bug, you start explaining the program logic step by step to an imaginary partner or even to an inanimate object such as a yellow rubber duck. This process is often very effective, much in excess of what one would expect given the limited conversational abilities of rubber ducks. The underlying mechanism is probably the same as with comments — you start to understand your program better by just trying to explain it, and this lets you find bugs.
|
||||
|
||||
When working in a team, you even have a luxury of explaining your code to another developer who works on the same project. It's probably more entertaining than talking to a duck. More importantly, they are going to maintain the code you wrote, so better make sure that _they_ can understand it as well. A good formal occasion for explaining how your code works is the code review process. Let's see how you can get the most out of it, in terms of making your code understandable.
|
||||
|
||||
@ -28,14 +28,14 @@ When working in a team, you even have a luxury of explaining your code to anothe
|
||||
|
||||
Code review is often framed as a gatekeeping process, where each contribution is vetted by maintainers to ensure that it is in line with project direction, has acceptable quality, meets the coding guidelines and so on. This perspective might seem natural when dealing with external contributions, but makes less sense if you apply it to internal ones. After all, our fellow maintainers have perfect understanding of project goals and guidelines, probably they are more talented and experienced than us, and can be trusted to produce the best solution possible. How can an additional review be helpful?
|
||||
|
||||
<img src="https://blog-images.clickhouse.tech/en/2021/code-review/hidden-items.png"/>
|
||||
|
||||
A less-obvious, but very important, part of reviewing the code is just seeing whether it can be understood by another person. It is helpful regardless of the administrative roles and programming proficiency of the parties. What should you do as a reviewer if ease of understanding is your main priority?
|
||||
|
||||
You probably don't need to be concerned with trivia such as code style. There are automated tools for that. You might find some bugs, but this is probably a side effect. Your main task is making sense of the code.
|
||||
|
||||
Start with checking the high-level description of the problem that the pull request is trying to solve. Read the description of the bug it fixes, or the docs for the feature it adds. For bigger features, there is normally a design document that describes the overall implementation without getting too deep into the code details. After you understand the problem, start reading the code. Does it make sense to you? You shouldn't try too hard to understand it. Imagine that you are tired and under time pressure. If you feel you have to make a lot of effort to understand the code, ask the author for clarifications. As you talk, you might discover that the code is not correct, or it may be rewritten in a more straightforward way, or it needs more comments.
|
||||
|
||||
<img src="https://blog-images.clickhouse.tech/en/2021/code-review/hidden-items.png"/>
|
||||
|
||||
After you get the answers, don't forget to update the code and the comments to reflect them. Don't just stop after getting it explained to you personally. If you had a question as a reviewer, chances are that other people will also have this question later, but there might be nobody around to ask. They will have to resort to `git blame` and re-reading the entire pull request or several of them. Code archaeology is sometimes fun, but it's the last thing you want to do when you are investigating an urgent bug. All the answers should be on the surface.
|
||||
|
||||
Working with the author, you should ensure that the code is mostly obvious to anyone with basic domain and programming knowledge, and all non-obvious parts are clearly explained.
|
||||
@ -62,22 +62,22 @@ It is common to hear objections to the idea of commenting the code, so let's dis
|
||||
|
||||
### Self-documenting Code
|
||||
|
||||
You can often see a perplexing idea that the source code can somehow be "self-documenting", or that the comments are a "code smell", and their presence indicates that the code is badly written. I have trouble imagining how this belief can be compatible with any experience in maintaining sufficiently complex and large software, over the years, in collaboration with others. The code and the comments describe different parts of the solution. The code describes the data structures and their transformations, but it cannot convey meaning. The names in the code serve as pointers that map the data and its transformations to the domain concepts, but they are schematic and lack nuance. It is not so difficult to write code that makes it easy to understand what's going on in terms of data manipulation. What it takes is mostly moderation, that is, stopping yourself from being too clever. For most code, it is easy to see what it does, but why? Why this way and not that way? Why is it correct? Why this fast path here helps? Why did you choose this data layout? How is this invariant guaranteed? Why do you need this data now and not later? And so on. This might be not so evident for a developer who is working alone on a short-lived project, because they have all the necessary context in their head. But when they have to work with other people (or even with themselves from past and future), or in an unfamiliar area, the importance of non-code, higher-level context becomes painfully clear. The idea that we should, or even can, somehow encode comments such as [this one](https://github.com/postgres/postgres/blob/55dc86eca70b1dc18a79c141b3567efed910329d/src/backend/optimizer/path/indxpath.c#L2226) into names or control flow is just absurd.
|
||||
You can often see a perplexing idea that the source code can somehow be "self-documenting", or that the comments are a "code smell", and their presence indicates that the code is badly written. I have trouble imagining how this belief can be compatible with any experience in maintaining sufficiently complex and large software, over the years, in collaboration with others. The code and the comments describe different parts of the solution. The code describes the data structures and their transformations, but it cannot convey meaning. The names in the code serve as pointers that map the data and its transformations to the domain concepts, but they are schematic and lack nuance. It is not so difficult to write code that makes it easy to understand what's going on in terms of data manipulation. What it takes is mostly moderation, that is, stopping yourself from being too clever. For most code, it is easy to see what it does, but why? Why this way and not that way? Why is it correct? Why this fast path here helps? Why did you choose this data layout? How is this invariant guaranteed? And so on. This might be not so evident for a developer who is working alone on a short-lived project, because they have all the necessary context in their head. But when they have to work with other people (or even with themselves from past and future), or in an unfamiliar area, the importance of non-code, higher-level context becomes painfully clear. The idea that we should, or even can, somehow encode comments such as [this one](https://github.com/postgres/postgres/blob/55dc86eca70b1dc18a79c141b3567efed910329d/src/backend/optimizer/path/indxpath.c#L2226) into names or control flow is just absurd.
|
||||
|
||||
### Obsolete Comments
|
||||
|
||||
The comments can't be checked by the compiler or the tests, so there is no automated way to make sure that they are up to date with the rest of the comments and the code. The possibility of comments gradually getting incorrect is sometimes used as an argument against having any comments at all.
|
||||
|
||||
This problem is not exclusive to the comments -- the code also can and does become obsolete. Simple cases such as dead code can be detected by static analysis or studying the test coverage of code. More complex cases can only be found by proofreading, such as maintaining an invariant that is not important anymore, or preparing some data that is not needed.
|
||||
This problem is not exclusive to the comments — the code also can and does become obsolete. Simple cases such as dead code can be detected by static analysis or studying the test coverage of code. More complex cases can only be found by proofreading, such as maintaining an invariant that is not important anymore, or preparing some data that is not needed.
|
||||
|
||||
While an obsolete comment can lead to a mistake, the same applies, perhaps more strongly, to the lack of comments. When you need some higher-level knowledge about the code, but it is not written down, you are forced to perform an entire investigation from first principles to understand what's going on, and this is error-prone. Even an obsolete comment likely gives a better starting point than nothing. Moreover, in a code base that makes an active use of the comments, they tend to be mostly correct. This is because the developers rely on comments, read and write them, pay attention to them during code review. The comments are routinely changed along with changing the code, and the outdated comments are soon noticed and fixed. This does require some habit. A lone comment in a vast desert of impenetrable self-documenting code is not going to fare well.
|
||||
|
||||
|
||||
## Conclusion
|
||||
|
||||
Code review makes your software better, and a significant part of this probably comes from trying to understand what your software actually does. By paying attention specifically to this aspect of code review, you can make it even more efficient. You'll have less bugs, and your code will be easier to maintain -- and what else could we ask for as software developers?
|
||||
Code review makes your software better, and a significant part of this probably comes from trying to understand what your software actually does. By paying attention specifically to this aspect of code review, you can make it even more efficient. You'll have less bugs, and your code will be easier to maintain — and what else could we ask for as software developers?
|
||||
|
||||
|
||||
_2021-04-13 [Alexander Kuzmenkov](https://github.com/akuzm)_
|
||||
_2021-04-13 [Alexander Kuzmenkov](https://github.com/akuzm). Title photo by [Nikita Mikhaylov](https://github.com/nikitamikhaylov)_
|
||||
|
||||
_P.S. This text contains the personal opinions of the author, and is not an authoritative manual for ClickHouse maintainers._
|
||||
|
Loading…
Reference in New Issue
Block a user