TranslateProject/sources/tech/20200702 6 best practices for managing Git repos.md

139 lines
9.4 KiB
Markdown
Raw Normal View History

[#]: collector: (lujun9972)
2021-03-10 15:06:20 +08:00
[#]: translator: (stevenzdg988)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (6 best practices for managing Git repos)
[#]: via: (https://opensource.com/article/20/7/git-repos-best-practices)
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
6 best practices for managing Git repos
======
Resist the urge to add things in Git that will make it harder to manage;
here's what to do instead.
![Working from home at a laptop][1]
Having access to source code makes it possible to analyze the security and safety of applications. But if nobody actually looks at the code, the issues wont get caught, and even when people are actively looking at code, theres usually quite a lot to look at. Fortunately, GitHub has an active security team, and recently, they [revealed a Trojan that had been committed into several Git repositories][2], having snuck past even the repo owners. While we cant control how other people manage their own repositories, we can learn from their mistakes. To that end, this article reviews some of the best practices when it comes to adding files to your own repositories.
### Know your repo
![Git repository terminal][3]
This is arguably Rule Zero for a secure Git repository. As a project maintainer, whether you started it yourself or youve adopted it from someone else, its your job to know the contents of your own repository. You might not have a memorized list of every file in your codebase, but you need to know the basic components of what youre managing. Should a stray file appear after a few dozen merges, youll be able to spot it easily because you wont know what its for, and youll need to inspect it to refresh your memory. When that happens, review the file and make sure you understand exactly why its necessary.
### Ban binary blobs
![Git binary check command in terminal][4]
Git is meant for text, whether its C or Python or Java written in plain text, or JSON, YAML, XML, Markdown, HTML, or something similar. Git isnt ideal for binary files.
Its the difference between this:
```
$ cat hello.txt
This is plain text.
It's readable by humans and machines alike.
Git knows how to version this.
$ git diff hello.txt
diff --git a/hello.txt b/hello.txt
index f227cc3..0d85b44 100644
\--- a/hello.txt
+++ b/hello.txt
@@ -1,2 +1,3 @@
 This is plain text.
+It's readable by humans and machines alike.
 Git knows how to version this.
```
and this:
```
$ git diff pixel.png
diff --git a/pixel.png b/pixel.png
index 563235a..7aab7bc 100644
Binary files a/pixel.png and b/pixel.png differ
$ cat pixel.png
<EFBFBD>PNG
IHDR7n<EFBFBD>$gAMA<4D><41>
              <20>abKGD݊<44>tIME<4D>
                          -2R<32><52>
IDA<EFBFBD>c`<60>!<21>3%tEXtdate:create2020-06-11T11:45:04+12:00<30><30>r.%tEXtdate:modify2020-06-11T11:45:04+12:00<30><30>ʒIEND<4E>B`<60>
```
The data in a binary file cant be parsed in the same way plain text can be parsed, so if anything is changed in a binary file, the whole thing must be rewritten. The only difference between one version and the other is everything, which adds up quickly.
Worse still, binary data cant be reasonably audited by you, the Git repository maintainer. Thats a violation of Rule Zero: know whats in your repository.
In addition to the usual [POSIX][5] tools, you can detect binaries using `git diff`. When you try to diff a binary file using the `--numstat` option, Git returns a null result:
```
$ git diff --numstat /dev/null pixel.png | tee
\-     -   /dev/null =&gt; pixel.png
$ git diff --numstat /dev/null file.txt | tee
5788  0   /dev/null =&gt; list.txt
```
If youre considering committing binary blobs to your repository, stop and think about it first. If its binary, it was generated by something. Is there a good reason not to generate them at build time instead of committing them to your repo? Should you decide it does make sense to commit binary data, make sure you identify, in a README file or similar, where the binary files are, why theyre binary, and what the protocol is for updating them. Updates must be performed sparingly, because, for every change you commit to a binary blob, the storage space for that blob effectively doubles.
### Keep third-party libraries third-party
Third-party libraries are no exception to this rule. While its one of the many benefits of open source that you can freely re-use and re-distribute code you didnt write, there are many good reasons not to house a third-party library in your own repository. First of all, you cant exactly vouch for a third party, unless youve reviewed all of its code (and future merges) yourself. Secondly, when you copy third party libraries into your Git repo, it splinters focus away from the true upstream source. Someone confident in the library is technically only confident in the master copy of the library, not in a copy lying around in a random repo. If you need to lock into a specific version of a library, either provide developers with a reasonable URL the release your project needs or else use [Git Submodule][6].
### Resist a blind git add
![Git manual add command in terminal][7]
If your project is compiled, resist the urge to use `git add .` (where `.` is either the current directory or the path to a specific folder) as an easy way to add anything and everything new. This is especially important if youre not manually compiling your project, but are using an IDE to manage your project for you. It can be extremely difficult to track whats gotten added to your repository when an IDE manages your project, so its important to only add what youve actually written and not any new object that pops up in your project folder.
If you do use `git add .`, review whats in staging before you push. If you see an unfamiliar object in your project folder when you do a `git status`, find out where it came from and why its still in your project directory after youve run a `make clean` or equivalent command. Its a rare build artifact that wont regenerate during compilation, so think twice before committing it.
### Use Git ignore
![Git ignore command in terminal][8]
Many of the conveniences built for programmers are also very noisy. The typical project directory for any project, programming, or artistic or otherwise, is littered with hidden files, metadata, and leftover artifacts. You can try to ignore these objects, but the more noise there is in your `git status`, the more likely you are to miss something.
You can Git filter out this noise for you by maintaining a good gitignore file. Because thats a common requirement for anyone using Git, there are a few starter gitignore files available. [Github.com/github/gitignore][9] offers several purpose-built gitignore files you can download and place into your own project, and [Gitlab.com][10] integrated gitignore templates into the repo creation workflow several years ago. Use these to help you build a reasonable gitignore policy for your project, and stick to it.
### Review merge requests
![Git merge request][11]
When you get a merge or pull request or a patch file through email, dont just test it to make sure it works. Its your job to read new code coming into your codebase and to understand how it produces the result it does. If you disagree with the implementation, or worse, you dont comprehend the implementation, send a message back to the person submitting it and ask for clarification. Its not a social faux pas to question code looking to become a permanent fixture in your repository, but its a breach of your social contract with your users to not know what you merge into the code theyll be using.
### Git responsible
Good software security in open source is a community effort. Dont encourage poor Git practices in your repositories, and dont overlook a security threat in repositories you clone. Git is powerful, but its still just a computer program, so be the human in the equation and keep everyone safe.
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/7/git-repos-best-practices
作者:[Seth Kenlon][a]
选题:[lujun9972][b]
2021-03-10 15:06:20 +08:00
译者:[stevenzdg988](https://github.com/stevenzdg988)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/seth
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/wfh_work_home_laptop_work.png?itok=VFwToeMy (Working from home at a laptop)
[2]: https://securitylab.github.com/research/octopus-scanner-malware-open-source-supply-chain/
[3]: https://opensource.com/sites/default/files/uploads/git_repo.png (Git repository )
[4]: https://opensource.com/sites/default/files/uploads/git-binary-check.jpg (Git binary check)
[5]: https://opensource.com/article/19/7/what-posix-richard-stallman-explains
[6]: https://git-scm.com/book/en/v2/Git-Tools-Submodules
[7]: https://opensource.com/sites/default/files/uploads/git-cola-manual-add.jpg (Git manual add)
[8]: https://opensource.com/sites/default/files/uploads/git-ignore.jpg (Git ignore)
[9]: https://github.com/github/gitignore
[10]: https://about.gitlab.com/releases/2016/05/22/gitlab-8-8-released
[11]: https://opensource.com/sites/default/files/uploads/git_merge_request.png (Git merge request)