TranslateProject/sources/tech/20230307.3 ⭐️⭐️⭐️ Some possible reasons for 8-bit bytes.md

313 lines
17 KiB
Markdown
Raw Normal View History

[#]: subject: "Some possible reasons for 8-bit bytes"
[#]: via: "https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes/"
[#]: author: "Julia Evans https://jvns.ca/"
[#]: collector: "lkxed"
[#]: translator: " "
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
Some possible reasons for 8-bit bytes
======
Ive been working on a zine about how computers represent thing in binary, and
one question Ive gotten a few times is why does the x86 architecture use 8-bit bytes? Why not
some other size?
With any question like this, I think there are two options:
- Its a historical accident, another size (like 4 or 6 or 16 bits) would work just as well
- 8 bits is objectively the Best Option for some reason, even if history had played out differently we would still use 8-bit bytes
- some mix of 1 & 2
Im not super into computer history (I like to use computers a lot more than I
like reading about them), but I am always curious if theres an essential
reason for why a computer thing is the way it is today, or whether its mostly
a historical accident. So were going to talk about some computer history.
As an example of a historical accident: DNS has a `class` field which has 5
possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me thats
a clear example of a historical accident I cant imagine that wed define
the class field the same way if we could redesign DNS today without worrying about backwards compatibility. Im
not sure if wed use a class field at all!
There arent any definitive answers in this post, but I asked [on Mastodon][1] and
here are some potential reasons I found for the 8-bit byte. I think the answer
is some combination of these reasons.
#### whats the difference between a byte and a word?
First, this post talks about “bytes” and “words” a lot. Whats the difference between a byte and a word? My understanding is:
- the **byte size** is the smallest unit you can address. For example in a program on my machine `0x20aa87c68` might be the address of one byte, then `0x20aa87c69` is the address of the next byte.
- The **word size** is some multiple of the byte size. Ive been confused about
this for years, and the Wikipedia definition is incredibly vague (“a word is
the natural unit of data used by a particular processor design”). I
originally thought that the word size was the same as your register size (64
bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2],
on x86 a word is 16 bits even though the registers are 64 bits. So Im
confused is a word on x86 16 bits or 64 bits? Can it mean both, depending
on the context? Whats the deal?
Now lets talk about some possible reasons that we use 8-bit bytes!
#### reason 1: to fit the English alphabet in 1 byte
[This Wikipedia article][3] says that the IBM System/360 introduced the 8-bit byte in 1964.
Heres a [video interview with Fred Brooks (who managed the project)][4] talking about why. Ive transcribed some of it here:
> … the six bit bytes [are] really better for scientific computing and the 8-bit byte ones are really better for commercial computing and each one can be made to work for the other.
> So it came down to an executive decision and I decided for the 8-bit byte, Jerrys proposal.
>
> ...
>
> My most important technical decision in my IBM career was to go with the 8-bit byte for the 360.
> And on the basis of I believe character processing was going to become important as opposed to decimal digits.
It makes sense that an 8-bit byte would be better for text processing: 2^6 is
64, so 6 bits wouldnt be enough for lowercase letters, uppercase letters, and symbols.
To go with the 8-bit byte, System/360 also introduced the [EBCDIC][5] encoding, which is an 8-bit character encoding.
It looks like the next important machine in 8-bit-byte history was the
[Intel 8008][6], which was built to be
used in a computer terminal (the Datapoint 2200). Terminals need to be able to
represent letters as well as terminal control codes, so it makes sense for them
to use an 8-bit byte.
[This Datapoint 2200 manual from the Computer History Museum][7]
says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
#### why was the 6-bit byte better for scientific computing?
I was curious about this comment that the 6-bit byte would be better for scientific computing. Heres a quote from [this interview from Gene Amdahl][8]:
> I wanted to make it 24 and 48 instead of 32 and 64, on the basis that this
> would have given me a more rational floating point system, because in floating
> point, with the 32-bit word, you had to keep the exponent to just 8 bits for
> exponent sign, and to make that reasonable in terms of numeric range it could
> span, you had to adjust by 4 bits instead of by a single bit. And so it caused
> you to lose some of the information more rapidly than you would with binary
> shifting
I dont understand this comment at all why does the exponent have to be 8 bits
if you use a 32-bit word size? Why couldnt you use 9 bits or 10 bits if you
wanted? But its all I could find in a quick search.
#### why did mainframes use 36 bits?
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out
that theres a great explanation in the Wikipedia article on [36-bit computing][9]:
> Prior to the introduction of computers, the state of the art in precision
> scientific and engineering calculation was the ten-digit, electrically powered,
> mechanical calculator… These calculators had a column of keys for each digit,
> and operators were trained to use all their fingers when entering numbers, so
> while some specialized calculators had more columns, ten was a practical limit.
>
> Early binary computers aimed at the same market therefore often used a 36-bit
> word length. This was long enough to represent positive and negative integers
> to an accuracy of ten decimal digits (35 bits would have been the minimum)
So this 36 bit thing seems to based on the fact that log_2(20000000000) is 34.2. Huh.
My guess is that the reason for this is in the 50s, computers were
extremely expensive. So if you wanted your computer to support ten decimal
digits, youd design so that it had exactly enough bits to do that, and no
more.
Today computers are way faster and cheaper, so if you want to represent ten
decimal digits for some reason you can just use 64 bits wasting a little bit
of space is usually no big deal.
Someone else mentioned that some of these machines with 36-bit word sizes let
you choose a byte size you could use 5 or 6 or 7 or 8-bit bytes, depending
on the context.
#### reason 2: to work well with binary-coded decimal
In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that
encoded every decimal digit in 4 bits.
For example, if you wanted to encode the number 1234, in BCD that would be something like:
```
0001 0010 0011 0100
```
So if you want to be able to easily work with binary-coded decimal, your byte
size should be a multiple of 4 bits, like 8 bits!
#### why was BCD popular?
This integer representation seemed really weird to me why not just use
binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
My best guess about why is that early computers didnt have displays the same way we do
now, so the contents of a byte were mapped directly to on/off lights.
Heres a [picture from Wikipedia of an IBM 650 with some lights on its display][11] ([CC BY-SA 3.0][12]):
![][13]
So if you want people to be relatively able to easily read off a decimal number
from its binary representation, this makes a lot more sense. I think today BCD
is obsolete because we have displays and our computers can convert numbers
represented in binary to decimal for us and display them.
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from in
the context of BCD, you end up referring to half bytes a lot (because every
digits is 4 bits). So it makes sense to have a word for “4 bits”, and people
called 4 bits a nibble. Today “nibble” feels to me like an archaic term though
Ive definitely never used it except as a fun fact (its such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
> The nibble is used to describe the amount of memory used to store a digit of
> a number stored in packed decimal format (BCD) within an IBM mainframe.
Another reason someone mentioned for BCD was **financial calculations**. Today
if you want to store a dollar amount, youll typically just use an integer
amount of cents, and then divide by 100 if you want the dollar part. This is no
big deal, division is fast. But apparently in the 70s dividing an integer
represented in binary by 100 was very slow, so it was worth it to redesign how
you represent your integers to avoid having to divide by 100.
Okay, enough about BCD.
#### reason 3: 8 is a power of 2?
A bunch of people said its important for a CPUs byte size to be a power of 2.
I cant figure out whether this is true or not though, and I wasnt satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper.
And historically there have definitely been lots of machines that used byte sizes that werent powers of 2, for example (from [this retro computing stack exchange thread][15]):
- Cyber 180 mainframes used 6-bit bytes
- the Univac 1100 / 2200 series used a 36-bit word size
- the PDP-8 was a 12-bit machine
Some reasons I heard for why powers of 2 are good that I havent understood yet:
- every bit in a word needs a bus, and you want the number of buses to be a power of 2 (why?)
- a lot of circuit logic is susceptible to divide-and-conquer techniques (I think I need an example to understand this)
Reasons that made more sense to me:
- it makes it easier to design **clock dividers** that can measure “8 bits were
sent on this wire” that work based on halving you can put 3 halving clock
dividers in series. [Graham Sutherland][16] told me about this and made this really cool
[simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
- if you have an instruction that zeroes out a specific bit in a byte, then if
your byte size is 8 (2^3), you can use just 3 bits of your instruction to
indicate which bit. x86 doesnt seem to do this, but the [Z80s bit testing instructions][18] do.
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work
in groups of 4 bits. From some quick Googling it seems like there are a wide
variety of adder circuits out there though.
- **bitmaps**: Your computers memory is organized into pages (usually of size 2^n). It
needs to keep track of whether every page is free or not. Operating systems
use a bitmap to do this, where each bit corresponds to a page and is 0 or 1
depending on whether the page is free. If you had a 9-bit byte, you would
need to divide by 9 to find the page youre looking for in the bitmap.
Dividing by 9 is slower than dividing by 8, because dividing by powers of 2
is always the fastest thing.
I probably mangled some of those explanations pretty badly: Im pretty far out
of my comfort zone here. Lets move on.
#### reason 4: small byte sizes are good
You might be wondering well, if 8-bit bytes were better than 4-bit bytes,
why not keep increasing the byte size? We could have 16-bit bytes!
A couple of reasons to keep byte sizes small:
- Its a waste of space a byte is the minimum unit you can address, and if
your computer is storing a lot of ASCII text (which only needs 7 bits), it
would be a pretty big waste to dedicate 12 or 16 bits to each character when
you could use 8 bits instead.
- As bytes get bigger, your CPU needs to get more complex. For example you need one bus line per bit. So I guess simpler is better.
My understanding of CPU architecture is extremely shaky so Ill leave it at
that. The “its a waste of space” reason feels pretty compelling to me though.
#### reason 5: compatibility
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the
8086 (from 1976) the first x86 processor. It seems like the 8080 and the
8086 were really popular and thats where we get our modern x86 computers.
I think theres an “if it aint broke dont fix it” thing going on here I
assume that 8-bit bytes were working well, so Intel saw no need to change the
design. If you keep the same 8-bit byte, then you can reuse more of your
instruction set.
Also around the 80s we start getting network protocols like TCP
which use 8-bit bytes (usually called “octets”), and if youre going to be
implementing network protocols, you probably want to be using an 8-bit byte.
#### thats all!
It seems to me like the main reasons for the 8-bit byte are:
- a lot of early computer companies were American, the most commonly used language in the US is English
- those people wanted computers to be good at text processing
- smaller byte sizes are in general better
- 7 bits is the smallest size you can fit all English characters + punctuation in
- 8 is a better number than 7 (because its a power of 2)
- once you have popular 8-bit computers that are working well, you want to keep the same design for compatibility
Someone pointed out that [page 65 of this book from 1962][20]
talking about IBMs reasons to choose an 8-bit byte basically says the same thing:
- Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
- Within the limits of this capacity, a single character is represented by a
single byte, so that the length of any particular record is not dependent on
the coincidence of characters in that record.
- 8-bit bytes are reasonably economical of storage space
- For purely numerical work, a decimal digit can be represented by only 4
bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such
packing of numerical data is not essential, it is a common practice in
order to increase speed and storage efficiency. Strictly speaking, 4-bit
bytes belong to a different code, but the simplicity of the 4-and-8-bit
scheme, as compared with a combination 4-and-6-bit scheme, for example,
leads to simpler machine design and cleaner addressing logic.
- Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer
to take advantage of powerful features of binary addressing and indexing to
the bit level (see Chaps. 4 and 5 ) .
>
Overall this makes me feel like an 8-bit byte is a pretty natural choice if
youre designing a binary computer in an English-speaking country.
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes/
作者:[Julia Evans][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://jvns.ca/
[b]: https://github.com/lkxed/
[1]: https://social.jvns.ca/@b0rk/109976810279702728
[2]: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
[3]: https://en.wikipedia.org/wiki/IBM_System/360
[4]: https://www.youtube.com/watch?v=9oOCrAePJMs&t=140s
[5]: https://en.wikipedia.org/wiki/EBCDIC
[6]: https://en.wikipedia.org/wiki/Intel_8008
[7]: https://archive.computerhistory.org/resources/text/2009/102683240.05.02.acc.pdf
[8]: https://archive.computerhistory.org/resources/access/text/2013/05/102702492-05-01-acc.pdf
[9]: https://en.wikipedia.org/wiki/36-bit_computing
[10]: https://en.wikipedia.org/wiki/Binary-coded_decimal
[11]: https://commons.wikimedia.org/wiki/File:IBM-650-panel.jpg
[12]: http://creativecommons.org/licenses/by-sa/3.0/
[13]: https://upload.wikimedia.org/wikipedia/commons/a/ad/IBM-650-panel.jpg
[14]: https://en.wikipedia.org/wiki/Nibble
[15]: https://retrocomputing.stackexchange.com/questions/7937/last-computer-not-to-use-octets-8-bit-bytes
[16]: https://poly.nomial.co.uk/
[17]: https://www.falstad.com/circuit/circuitjs.html?ctz=CQAgjCAMB0l3BWcMBMcUHYMGZIA4UA2ATmIxAUgpABZsKBTAWjDACgwEknsUQ08tQQKgU2AdxA8+I6eAyEoEqb3mK8VMAqWSNakHsx9Iywxj6Ea-c0oBKUy-xpUWYGc-D9kcftCQo-URgEZRQERSMnKkiTSTDFLQjw62NlMBorRP5krNjwDP58fMztE04kdKsRFBQqoqoQyUcRVhl6tLdCwVaonXBO2s0Cwb6UPGEPXmiPPLHhIrne2Y9q8a6lcpAp9edo+r7tkW3c5WPtOj4TyQv9G5jlO5saMAibPOeIoppm9oAPEEU2C0-EBaFoThAAHoUGx-mA8FYgfNESgIFUrNDYVtCBBttg8LiUPR0VCYWhyD0Wp0slYACIASQAamTIORFqtuucQAzGTQ2OTaD9BN8Soo6Uy8PzWQ46oImI4aSB6QA5ZTy9EuVQjPLq3q6kQmAD21Beome0qQMHgkDIhHCYVEfCQ9BVbGNRHAiio5vIltg8Ft9stXg99B5MPdFK9tDAFqg-rggcIDui1i23KZfPd3WjPuoVoDCiDjv4gjDErYQA
[18]: http://www.chebucto.ns.ca/~af380/z-80-h.htm
[19]: https://en.wikipedia.org/wiki/Carry-lookahead_adder
[20]: https://web.archive.org/web/20170403014651/http://archive.computerhistory.org/resources/text/IBM/Stretch/pdfs/Buchholz_102636426.pdf