Merge pull request #28956 from lkxed/20230307-3-Some-possible-reasons-for-8-bit-bytes

[手动选题][tech]: 20230307.3 ⭐️⭐️⭐️ Some possible reasons for 8-bit bytes.md
2024-12-26 21:30:55 +08:00 · 2023-03-25 08:24:04 +08:00 · 2023-03-25 08:24:04 +08:00 · 93ab0ecd80
commit 93ab0ecd80
parent f862ca1fa8 5729254404
1 changed files with 312 additions and 0 deletions
--- a/sources/tech/20230307.3
+++ b/sources/tech/20230307.3
@ -0,0 +1,312 @@
+[#]: subject: "Some possible reasons for 8-bit bytes"
+[#]: via: "https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes/"
+[#]: author: "Julia Evans https://jvns.ca/"
+[#]: collector: "lkxed"
+[#]: translator: " "
+[#]: reviewer: " "
+[#]: publisher: " "
+[#]: url: " "
+
+Some possible reasons for 8-bit bytes
+======
+
+I’ve been working on a zine about how computers represent thing in binary, and
+one question I’ve gotten a few times is – why does the x86 architecture use 8-bit bytes? Why not
+some other size?
+
+With any question like this, I think there are two options:
+
+- It’s a historical accident, another size (like 4 or 6 or 16 bits) would work just as well
+- 8 bits is objectively the Best Option for some reason, even if history had played out differently we would still use 8-bit bytes
+- some mix of 1 & 2
+
+I’m not super into computer history (I like to use computers a lot more than I
+like reading about them), but I am always curious if there’s an essential
+reason for why a computer thing is the way it is today, or whether it’s mostly
+a historical accident. So we’re going to talk about some computer history.
+
+As an example of a historical accident: DNS has a `class` field which has 5
+possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me that’s
+a clear example of a historical accident – I can’t imagine that we’d define
+the class field the same way if we could redesign DNS today without worrying about backwards compatibility. I’m
+not sure if we’d use a class field at all!
+
+There aren’t any definitive answers in this post, but I asked [on Mastodon][1] and
+here are some potential reasons I found for the 8-bit byte. I think the answer
+is some combination of these reasons.
+
+#### what’s the difference between a byte and a word?
+
+First, this post talks about “bytes” and “words” a lot. What’s the difference between a byte and a word? My understanding is:
+
+- the **byte size** is the smallest unit you can address. For example in a program on my machine `0x20aa87c68` might be the address of one byte, then `0x20aa87c69` is the address of the next byte.
+- The **word size** is some multiple of the byte size. I’ve been confused about
+this for years, and the Wikipedia definition is incredibly vague (“a word is
+the natural unit of data used by a particular processor design”). I
+originally thought that the word size was the same as your register size (64
+bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2],
+on x86 a word is 16 bits even though the registers are 64 bits. So I’m
+confused – is a word on x86 16 bits or 64 bits? Can it mean both, depending
+on the context? What’s the deal?
+
+Now let’s talk about some possible reasons that we use 8-bit bytes!
+
+#### reason 1: to fit the English alphabet in 1 byte
+
+[This Wikipedia article][3] says that the IBM System/360 introduced the 8-bit byte in 1964.
+
+Here’s a [video interview with Fred Brooks (who managed the project)][4] talking about why. I’ve transcribed some of it here:
+
+> … the six bit bytes [are] really better for scientific computing and the 8-bit byte ones are really better for commercial computing and each one can be made to work for the other.
+> So it came down to an executive decision and I decided for the 8-bit byte, Jerry’s proposal.
+> 
+> ...
+> 
+> My most important technical decision in my IBM career was to go with the 8-bit byte for the 360.
+> And on the basis of I believe character processing was going to become important as opposed to decimal digits.
+
+It makes sense that an 8-bit byte would be better for text processing: 2^6 is
+64, so 6 bits wouldn’t be enough for lowercase letters, uppercase letters, and symbols.
+
+To go with the 8-bit byte, System/360 also introduced the [EBCDIC][5] encoding, which is an 8-bit character encoding.
+
+It looks like the next important machine in 8-bit-byte history was the
+[Intel 8008][6], which was built to be
+used in a computer terminal (the Datapoint 2200). Terminals need to be able to
+represent letters as well as terminal control codes, so it makes sense for them
+to use an 8-bit byte.
+[This Datapoint 2200 manual from the Computer History Museum][7]
+says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
+
+#### why was the 6-bit byte better for scientific computing?
+
+I was curious about this comment that the 6-bit byte would be better for scientific computing. Here’s a quote from [this interview from Gene Amdahl][8]:
+
+> I wanted to make it 24 and 48 instead of 32 and 64, on the basis that this
+> would have given me a more rational floating point system, because in floating
+> point, with the 32-bit word, you had to keep the exponent to just 8 bits for
+> exponent sign, and to make that reasonable in terms of numeric range it could
+> span, you had to adjust by 4 bits instead of by a single bit. And so it caused
+> you to lose some of the information more rapidly than you would with binary
+> shifting
+
+I don’t understand this comment at all – why does the exponent have to be 8 bits
+if you use a 32-bit word size? Why couldn’t you use 9 bits or 10 bits if you
+wanted? But it’s all I could find in a quick search.
+
+#### why did mainframes use 36 bits?
+
+Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out
+that there’s a great explanation in the Wikipedia article on [36-bit computing][9]:
+
+> Prior to the introduction of computers, the state of the art in precision
+> scientific and engineering calculation was the ten-digit, electrically powered,
+> mechanical calculator… These calculators had a column of keys for each digit,
+> and operators were trained to use all their fingers when entering numbers, so
+> while some specialized calculators had more columns, ten was a practical limit.
+> 
+> Early binary computers aimed at the same market therefore often used a 36-bit
+> word length. This was long enough to represent positive and negative integers
+> to an accuracy of ten decimal digits (35 bits would have been the minimum)
+
+So this 36 bit thing seems to based on the fact that log_2(20000000000) is 34.2. Huh.
+
+My guess is that the reason for this is in the 50s, computers were
+extremely expensive. So if you wanted your computer to support ten decimal
+digits, you’d design so that it had exactly enough bits to do that, and no
+more.
+
+Today computers are way faster and cheaper, so if you want to represent ten
+decimal digits for some reason you can just use 64 bits – wasting a little bit
+of space is usually no big deal.
+
+Someone else mentioned that some of these machines with 36-bit word sizes let
+you choose a byte size – you could use 5 or 6 or 7 or 8-bit bytes, depending
+on the context.
+
+#### reason 2: to work well with binary-coded decimal
+
+In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that
+encoded every decimal digit in 4 bits.
+
+For example, if you wanted to encode the number 1234, in BCD that would be something like:
+
+```
+0001 0010 0011 0100
+```
+
+So if you want to be able to easily work with binary-coded decimal, your byte
+size should be a multiple of 4 bits, like 8 bits!
+
+#### why was BCD popular?
+
+This integer representation seemed really weird to me – why not just use
+binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
+
+My best guess about why is that early computers didn’t have displays the same way we do
+now, so the contents of a byte were mapped directly to on/off lights.
+
+Here’s a [picture from Wikipedia of an IBM 650 with some lights on its display][11] ([CC BY-SA 3.0][12]):
+
+![][13]
+
+So if you want people to be relatively able to easily read off a decimal number
+from its binary representation, this makes a lot more sense. I think today BCD
+is obsolete because we have displays and our computers can convert numbers
+represented in binary to decimal for us and display them.
+
+Also, I wonder if BCD is where the term “nibble” for 4 bits comes from – in
+the context of BCD, you end up referring to half bytes a lot (because every
+digits is 4 bits). So it makes sense to have a word for “4 bits”, and people
+called 4 bits a nibble. Today “nibble” feels to me like an archaic term though –
+I’ve definitely never used it except as a fun fact (it’s such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
+
+> The nibble is used to describe the amount of memory used to store a digit of
+> a number stored in packed decimal format (BCD) within an IBM mainframe.
+
+Another reason someone mentioned for BCD was **financial calculations**. Today
+if you want to store a dollar amount, you’ll typically just use an integer
+amount of cents, and then divide by 100 if you want the dollar part. This is no
+big deal, division is fast. But apparently in the 70s dividing an integer
+represented in binary by 100 was very slow, so it was worth it to redesign how
+you represent your integers to avoid having to divide by 100.
+
+Okay, enough about BCD.
+
+#### reason 3: 8 is a power of 2?
+
+A bunch of people said it’s important for a CPU’s byte size to be a power of 2.
+I can’t figure out whether this is true or not though, and I wasn’t satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper.
+And historically there have definitely been lots of machines that used byte sizes that weren’t powers of 2, for example (from [this retro computing stack exchange thread][15]):
+
+- Cyber 180 mainframes used 6-bit bytes
+- the Univac 1100 / 2200 series used a 36-bit word size
+- the PDP-8 was a 12-bit machine
+
+Some reasons I heard for why powers of 2 are good that I haven’t understood yet:
+
+- every bit in a word needs a bus, and you want the number of buses to be a power of 2 (why?)
+- a lot of circuit logic is susceptible to divide-and-conquer techniques (I think I need an example to understand this)
+
+Reasons that made more sense to me:
+
+- it makes it easier to design **clock dividers** that can measure “8 bits were
+sent on this wire” that work based on halving – you can put 3 halving clock
+dividers in series. [Graham Sutherland][16] told me about this and made this really cool
+[simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
+- if you have an instruction that zeroes out a specific bit in a byte, then if
+your byte size is 8 (2^3), you can use just 3 bits of your instruction to
+indicate which bit. x86 doesn’t seem to do this, but the [Z80’s bit testing instructions][18] do.
+- someone mentioned that some processors use [Carry-lookahead adders][19], and they work
+in groups of 4 bits. From some quick Googling it seems like there are a wide
+variety of adder circuits out there though.
+- **bitmaps**: Your computer’s memory is organized into pages (usually of size 2^n). It
+needs to keep track of whether every page is free or not. Operating systems
+use a bitmap to do this, where each bit corresponds to a page and is 0 or 1
+depending on whether the page is free. If you had a 9-bit byte, you would
+need to divide by 9 to find the page you’re looking for in the bitmap.
+Dividing by 9 is slower than dividing by 8, because dividing by powers of 2
+is always the fastest thing.
+
+I probably mangled some of those explanations pretty badly: I’m pretty far out
+of my comfort zone here. Let’s move on.
+
+#### reason 4: small byte sizes are good
+
+You might be wondering – well, if 8-bit bytes were better than 4-bit bytes,
+why not keep increasing the byte size? We could have 16-bit bytes!
+
+A couple of reasons to keep byte sizes small:
+
+- It’s a waste of space – a byte is the minimum unit you can address, and if
+your computer is storing a lot of ASCII text (which only needs 7 bits), it
+would be a pretty big waste to dedicate 12 or 16 bits to each character when
+you could use 8 bits instead.
+- As bytes get bigger, your CPU needs to get more complex. For example you need one bus line per bit. So I guess simpler is better.
+
+My understanding of CPU architecture is extremely shaky so I’ll leave it at
+that. The “it’s a waste of space” reason feels pretty compelling to me though.
+
+#### reason 5: compatibility
+
+The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the
+8086 (from 1976) – the first x86 processor. It seems like the 8080 and the
+8086 were really popular and that’s where we get our modern x86 computers.
+
+I think there’s an “if it ain’t broke don’t fix it” thing going on here – I
+assume that 8-bit bytes were working well, so Intel saw no need to change the
+design. If you keep the same 8-bit byte, then you can reuse more of your
+instruction set.
+
+Also around the 80s we start getting network protocols like TCP
+which use 8-bit bytes (usually called “octets”), and if you’re going to be
+implementing network protocols, you probably want to be using an 8-bit byte.
+
+#### that’s all!
+
+It seems to me like the main reasons for the 8-bit byte are:
+
+- a lot of early computer companies were American, the most commonly used language in the US is English
+- those people wanted computers to be good at text processing
+- smaller byte sizes are in general better
+- 7 bits is the smallest size you can fit all English characters + punctuation in
+- 8 is a better number than 7 (because it’s a power of 2)
+- once you have popular 8-bit computers that are working well, you want to keep the same design for compatibility
+
+Someone pointed out that [page 65 of this book from 1962][20]
+talking about IBM’s reasons to choose an 8-bit byte basically says the same thing:
+
+- Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
+- Within the limits of this capacity, a single character is represented by a
+single byte, so that the length of any particular record is not dependent on
+the coincidence of characters in that record.
+- 8-bit bytes are reasonably economical of storage space
+- For purely numerical work, a decimal digit can be represented by only 4
+bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such
+packing of numerical data is not essential, it is a common practice in
+order to increase speed and storage efficiency. Strictly speaking, 4-bit
+bytes belong to a different code, but the simplicity of the 4-and-8-bit
+scheme, as compared with a combination 4-and-6-bit scheme, for example,
+leads to simpler machine design and cleaner addressing logic.
+- Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer
+to take advantage of powerful features of binary addressing and indexing to
+the bit level (see Chaps. 4 and 5 ) .
+
+> 
+
+Overall this makes me feel like an 8-bit byte is a pretty natural choice if
+you’re designing a binary computer in an English-speaking country.
+
+--------------------------------------------------------------------------------
+
+via: https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes/
+
+作者：[Julia Evans][a]
+选题：[lkxed][b]
+译者：[译者ID](https://github.com/译者ID)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://jvns.ca/
+[b]: https://github.com/lkxed/
+[1]: https://social.jvns.ca/@b0rk/109976810279702728
+[2]: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
+[3]: https://en.wikipedia.org/wiki/IBM_System/360
+[4]: https://www.youtube.com/watch?v=9oOCrAePJMs&t=140s
+[5]: https://en.wikipedia.org/wiki/EBCDIC
+[6]: https://en.wikipedia.org/wiki/Intel_8008
+[7]: https://archive.computerhistory.org/resources/text/2009/102683240.05.02.acc.pdf
+[8]: https://archive.computerhistory.org/resources/access/text/2013/05/102702492-05-01-acc.pdf
+[9]: https://en.wikipedia.org/wiki/36-bit_computing
+[10]: https://en.wikipedia.org/wiki/Binary-coded_decimal
+[11]: https://commons.wikimedia.org/wiki/File:IBM-650-panel.jpg
+[12]: http://creativecommons.org/licenses/by-sa/3.0/
+[13]: https://upload.wikimedia.org/wikipedia/commons/a/ad/IBM-650-panel.jpg
+[14]: https://en.wikipedia.org/wiki/Nibble
+[15]: https://retrocomputing.stackexchange.com/questions/7937/last-computer-not-to-use-octets-8-bit-bytes
+[16]: https://poly.nomial.co.uk/
+[17]: https://www.falstad.com/circuit/circuitjs.html?ctz=CQAgjCAMB0l3BWcMBMcUHYMGZIA4UA2ATmIxAUgpABZsKBTAWjDACgwEknsUQ08tQQKgU2AdxA8+I6eAyEoEqb3mK8VMAqWSNakHsx9Iywxj6Ea-c0oBKUy-xpUWYGc-D9kcftCQo-URgEZRQERSMnKkiTSTDFLQjw62NlMBorRP5krNjwDP58fMztE04kdKsRFBQqoqoQyUcRVhl6tLdCwVaonXBO2s0Cwb6UPGEPXmiPPLHhIrne2Y9q8a6lcpAp9edo+r7tkW3c5WPtOj4TyQv9G5jlO5saMAibPOeIoppm9oAPEEU2C0-EBaFoThAAHoUGx-mA8FYgfNESgIFUrNDYVtCBBttg8LiUPR0VCYWhyD0Wp0slYACIASQAamTIORFqtuucQAzGTQ2OTaD9BN8Soo6Uy8PzWQ46oImI4aSB6QA5ZTy9EuVQjPLq3q6kQmAD21Beome0qQMHgkDIhHCYVEfCQ9BVbGNRHAiio5vIltg8Ft9stXg99B5MPdFK9tDAFqg-rggcIDui1i23KZfPd3WjPuoVoDCiDjv4gjDErYQA
+[18]: http://www.chebucto.ns.ca/~af380/z-80-h.htm
+[19]: https://en.wikipedia.org/wiki/Carry-lookahead_adder
+[20]: https://web.archive.org/web/20170403014651/http://archive.computerhistory.org/resources/text/IBM/Stretch/pdfs/Buchholz_102636426.pdf