TranslateProject/sources/tech/20200702 6 best practices for managing Git repos.md
2021-03-10 15:06:20 +08:00

9.4 KiB
Raw Blame History

6 best practices for managing Git repos

Resist the urge to add things in Git that will make it harder to manage; here's what to do instead. Working from home at a laptop

Having access to source code makes it possible to analyze the security and safety of applications. But if nobody actually looks at the code, the issues wont get caught, and even when people are actively looking at code, theres usually quite a lot to look at. Fortunately, GitHub has an active security team, and recently, they revealed a Trojan that had been committed into several Git repositories, having snuck past even the repo owners. While we cant control how other people manage their own repositories, we can learn from their mistakes. To that end, this article reviews some of the best practices when it comes to adding files to your own repositories.

Know your repo

Git repository terminal

This is arguably Rule Zero for a secure Git repository. As a project maintainer, whether you started it yourself or youve adopted it from someone else, its your job to know the contents of your own repository. You might not have a memorized list of every file in your codebase, but you need to know the basic components of what youre managing. Should a stray file appear after a few dozen merges, youll be able to spot it easily because you wont know what its for, and youll need to inspect it to refresh your memory. When that happens, review the file and make sure you understand exactly why its necessary.

Ban binary blobs

Git binary check command in terminal

Git is meant for text, whether its C or Python or Java written in plain text, or JSON, YAML, XML, Markdown, HTML, or something similar. Git isnt ideal for binary files.

Its the difference between this:

$ cat hello.txt
This is plain text.
It's readable by humans and machines alike.
Git knows how to version this.

$ git diff hello.txt
diff --git a/hello.txt b/hello.txt
index f227cc3..0d85b44 100644
\--- a/hello.txt
+++ b/hello.txt
@@ -1,2 +1,3 @@
 This is plain text.
+It's readable by humans and machines alike.
 Git knows how to version this.

and this:

$ git diff pixel.png
diff --git a/pixel.png b/pixel.png
index 563235a..7aab7bc 100644
Binary files a/pixel.png and b/pixel.png differ

$ cat pixel.png
<0A>PNG
▒
IHDR7n<37>$gAMA<4D><41>
              <20>abKGD݊<44>tIME<4D>

                          -2R<32><52>
IDA<44>c`<60>!<21>3%tEXtdate:create2020-06-11T11:45:04+12:00<30><30>r.%tEXtdate:modify2020-06-11T11:45:04+12:00<30><30>ʒIEND<4E>B`<60>

The data in a binary file cant be parsed in the same way plain text can be parsed, so if anything is changed in a binary file, the whole thing must be rewritten. The only difference between one version and the other is everything, which adds up quickly.

Worse still, binary data cant be reasonably audited by you, the Git repository maintainer. Thats a violation of Rule Zero: know whats in your repository.

In addition to the usual POSIX tools, you can detect binaries using git diff. When you try to diff a binary file using the --numstat option, Git returns a null result:

$ git diff --numstat /dev/null pixel.png | tee
\-     -   /dev/null =&gt; pixel.png
$ git diff --numstat /dev/null file.txt | tee
5788  0   /dev/null =&gt; list.txt

If youre considering committing binary blobs to your repository, stop and think about it first. If its binary, it was generated by something. Is there a good reason not to generate them at build time instead of committing them to your repo? Should you decide it does make sense to commit binary data, make sure you identify, in a README file or similar, where the binary files are, why theyre binary, and what the protocol is for updating them. Updates must be performed sparingly, because, for every change you commit to a binary blob, the storage space for that blob effectively doubles.

Keep third-party libraries third-party

Third-party libraries are no exception to this rule. While its one of the many benefits of open source that you can freely re-use and re-distribute code you didnt write, there are many good reasons not to house a third-party library in your own repository. First of all, you cant exactly vouch for a third party, unless youve reviewed all of its code (and future merges) yourself. Secondly, when you copy third party libraries into your Git repo, it splinters focus away from the true upstream source. Someone confident in the library is technically only confident in the master copy of the library, not in a copy lying around in a random repo. If you need to lock into a specific version of a library, either provide developers with a reasonable URL the release your project needs or else use Git Submodule.

Resist a blind git add

Git manual add command in terminal

If your project is compiled, resist the urge to use git add . (where . is either the current directory or the path to a specific folder) as an easy way to add anything and everything new. This is especially important if youre not manually compiling your project, but are using an IDE to manage your project for you. It can be extremely difficult to track whats gotten added to your repository when an IDE manages your project, so its important to only add what youve actually written and not any new object that pops up in your project folder.

If you do use git add ., review whats in staging before you push. If you see an unfamiliar object in your project folder when you do a git status, find out where it came from and why its still in your project directory after youve run a make clean or equivalent command. Its a rare build artifact that wont regenerate during compilation, so think twice before committing it.

Use Git ignore

Git ignore command in terminal

Many of the conveniences built for programmers are also very noisy. The typical project directory for any project, programming, or artistic or otherwise, is littered with hidden files, metadata, and leftover artifacts. You can try to ignore these objects, but the more noise there is in your git status, the more likely you are to miss something.

You can Git filter out this noise for you by maintaining a good gitignore file. Because thats a common requirement for anyone using Git, there are a few starter gitignore files available. Github.com/github/gitignore offers several purpose-built gitignore files you can download and place into your own project, and Gitlab.com integrated gitignore templates into the repo creation workflow several years ago. Use these to help you build a reasonable gitignore policy for your project, and stick to it.

Review merge requests

Git merge request

When you get a merge or pull request or a patch file through email, dont just test it to make sure it works. Its your job to read new code coming into your codebase and to understand how it produces the result it does. If you disagree with the implementation, or worse, you dont comprehend the implementation, send a message back to the person submitting it and ask for clarification. Its not a social faux pas to question code looking to become a permanent fixture in your repository, but its a breach of your social contract with your users to not know what you merge into the code theyll be using.

Git responsible

Good software security in open source is a community effort. Dont encourage poor Git practices in your repositories, and dont overlook a security threat in repositories you clone. Git is powerful, but its still just a computer program, so be the human in the equation and keep everyone safe.


via: https://opensource.com/article/20/7/git-repos-best-practices

作者:Seth Kenlon 选题:lujun9972 译者:stevenzdg988 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出