mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-01 21:50:13 +08:00
313 lines
17 KiB
Markdown
313 lines
17 KiB
Markdown
|
[#]: subject: "Some possible reasons for 8-bit bytes"
|
|||
|
[#]: via: "https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes/"
|
|||
|
[#]: author: "Julia Evans https://jvns.ca/"
|
|||
|
[#]: collector: "lkxed"
|
|||
|
[#]: translator: " "
|
|||
|
[#]: reviewer: " "
|
|||
|
[#]: publisher: " "
|
|||
|
[#]: url: " "
|
|||
|
|
|||
|
Some possible reasons for 8-bit bytes
|
|||
|
======
|
|||
|
|
|||
|
I’ve been working on a zine about how computers represent thing in binary, and
|
|||
|
one question I’ve gotten a few times is – why does the x86 architecture use 8-bit bytes? Why not
|
|||
|
some other size?
|
|||
|
|
|||
|
With any question like this, I think there are two options:
|
|||
|
|
|||
|
- It’s a historical accident, another size (like 4 or 6 or 16 bits) would work just as well
|
|||
|
- 8 bits is objectively the Best Option for some reason, even if history had played out differently we would still use 8-bit bytes
|
|||
|
- some mix of 1 & 2
|
|||
|
|
|||
|
I’m not super into computer history (I like to use computers a lot more than I
|
|||
|
like reading about them), but I am always curious if there’s an essential
|
|||
|
reason for why a computer thing is the way it is today, or whether it’s mostly
|
|||
|
a historical accident. So we’re going to talk about some computer history.
|
|||
|
|
|||
|
As an example of a historical accident: DNS has a `class` field which has 5
|
|||
|
possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me that’s
|
|||
|
a clear example of a historical accident – I can’t imagine that we’d define
|
|||
|
the class field the same way if we could redesign DNS today without worrying about backwards compatibility. I’m
|
|||
|
not sure if we’d use a class field at all!
|
|||
|
|
|||
|
There aren’t any definitive answers in this post, but I asked [on Mastodon][1] and
|
|||
|
here are some potential reasons I found for the 8-bit byte. I think the answer
|
|||
|
is some combination of these reasons.
|
|||
|
|
|||
|
#### what’s the difference between a byte and a word?
|
|||
|
|
|||
|
First, this post talks about “bytes” and “words” a lot. What’s the difference between a byte and a word? My understanding is:
|
|||
|
|
|||
|
- the **byte size** is the smallest unit you can address. For example in a program on my machine `0x20aa87c68` might be the address of one byte, then `0x20aa87c69` is the address of the next byte.
|
|||
|
- The **word size** is some multiple of the byte size. I’ve been confused about
|
|||
|
this for years, and the Wikipedia definition is incredibly vague (“a word is
|
|||
|
the natural unit of data used by a particular processor design”). I
|
|||
|
originally thought that the word size was the same as your register size (64
|
|||
|
bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2],
|
|||
|
on x86 a word is 16 bits even though the registers are 64 bits. So I’m
|
|||
|
confused – is a word on x86 16 bits or 64 bits? Can it mean both, depending
|
|||
|
on the context? What’s the deal?
|
|||
|
|
|||
|
Now let’s talk about some possible reasons that we use 8-bit bytes!
|
|||
|
|
|||
|
#### reason 1: to fit the English alphabet in 1 byte
|
|||
|
|
|||
|
[This Wikipedia article][3] says that the IBM System/360 introduced the 8-bit byte in 1964.
|
|||
|
|
|||
|
Here’s a [video interview with Fred Brooks (who managed the project)][4] talking about why. I’ve transcribed some of it here:
|
|||
|
|
|||
|
> … the six bit bytes [are] really better for scientific computing and the 8-bit byte ones are really better for commercial computing and each one can be made to work for the other.
|
|||
|
> So it came down to an executive decision and I decided for the 8-bit byte, Jerry’s proposal.
|
|||
|
>
|
|||
|
> ...
|
|||
|
>
|
|||
|
> My most important technical decision in my IBM career was to go with the 8-bit byte for the 360.
|
|||
|
> And on the basis of I believe character processing was going to become important as opposed to decimal digits.
|
|||
|
|
|||
|
It makes sense that an 8-bit byte would be better for text processing: 2^6 is
|
|||
|
64, so 6 bits wouldn’t be enough for lowercase letters, uppercase letters, and symbols.
|
|||
|
|
|||
|
To go with the 8-bit byte, System/360 also introduced the [EBCDIC][5] encoding, which is an 8-bit character encoding.
|
|||
|
|
|||
|
It looks like the next important machine in 8-bit-byte history was the
|
|||
|
[Intel 8008][6], which was built to be
|
|||
|
used in a computer terminal (the Datapoint 2200). Terminals need to be able to
|
|||
|
represent letters as well as terminal control codes, so it makes sense for them
|
|||
|
to use an 8-bit byte.
|
|||
|
[This Datapoint 2200 manual from the Computer History Museum][7]
|
|||
|
says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
|
|||
|
|
|||
|
#### why was the 6-bit byte better for scientific computing?
|
|||
|
|
|||
|
I was curious about this comment that the 6-bit byte would be better for scientific computing. Here’s a quote from [this interview from Gene Amdahl][8]:
|
|||
|
|
|||
|
> I wanted to make it 24 and 48 instead of 32 and 64, on the basis that this
|
|||
|
> would have given me a more rational floating point system, because in floating
|
|||
|
> point, with the 32-bit word, you had to keep the exponent to just 8 bits for
|
|||
|
> exponent sign, and to make that reasonable in terms of numeric range it could
|
|||
|
> span, you had to adjust by 4 bits instead of by a single bit. And so it caused
|
|||
|
> you to lose some of the information more rapidly than you would with binary
|
|||
|
> shifting
|
|||
|
|
|||
|
I don’t understand this comment at all – why does the exponent have to be 8 bits
|
|||
|
if you use a 32-bit word size? Why couldn’t you use 9 bits or 10 bits if you
|
|||
|
wanted? But it’s all I could find in a quick search.
|
|||
|
|
|||
|
#### why did mainframes use 36 bits?
|
|||
|
|
|||
|
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out
|
|||
|
that there’s a great explanation in the Wikipedia article on [36-bit computing][9]:
|
|||
|
|
|||
|
> Prior to the introduction of computers, the state of the art in precision
|
|||
|
> scientific and engineering calculation was the ten-digit, electrically powered,
|
|||
|
> mechanical calculator… These calculators had a column of keys for each digit,
|
|||
|
> and operators were trained to use all their fingers when entering numbers, so
|
|||
|
> while some specialized calculators had more columns, ten was a practical limit.
|
|||
|
>
|
|||
|
> Early binary computers aimed at the same market therefore often used a 36-bit
|
|||
|
> word length. This was long enough to represent positive and negative integers
|
|||
|
> to an accuracy of ten decimal digits (35 bits would have been the minimum)
|
|||
|
|
|||
|
So this 36 bit thing seems to based on the fact that log_2(20000000000) is 34.2. Huh.
|
|||
|
|
|||
|
My guess is that the reason for this is in the 50s, computers were
|
|||
|
extremely expensive. So if you wanted your computer to support ten decimal
|
|||
|
digits, you’d design so that it had exactly enough bits to do that, and no
|
|||
|
more.
|
|||
|
|
|||
|
Today computers are way faster and cheaper, so if you want to represent ten
|
|||
|
decimal digits for some reason you can just use 64 bits – wasting a little bit
|
|||
|
of space is usually no big deal.
|
|||
|
|
|||
|
Someone else mentioned that some of these machines with 36-bit word sizes let
|
|||
|
you choose a byte size – you could use 5 or 6 or 7 or 8-bit bytes, depending
|
|||
|
on the context.
|
|||
|
|
|||
|
#### reason 2: to work well with binary-coded decimal
|
|||
|
|
|||
|
In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that
|
|||
|
encoded every decimal digit in 4 bits.
|
|||
|
|
|||
|
For example, if you wanted to encode the number 1234, in BCD that would be something like:
|
|||
|
|
|||
|
```
|
|||
|
0001 0010 0011 0100
|
|||
|
```
|
|||
|
|
|||
|
So if you want to be able to easily work with binary-coded decimal, your byte
|
|||
|
size should be a multiple of 4 bits, like 8 bits!
|
|||
|
|
|||
|
#### why was BCD popular?
|
|||
|
|
|||
|
This integer representation seemed really weird to me – why not just use
|
|||
|
binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
|
|||
|
|
|||
|
My best guess about why is that early computers didn’t have displays the same way we do
|
|||
|
now, so the contents of a byte were mapped directly to on/off lights.
|
|||
|
|
|||
|
Here’s a [picture from Wikipedia of an IBM 650 with some lights on its display][11] ([CC BY-SA 3.0][12]):
|
|||
|
|
|||
|
![][13]
|
|||
|
|
|||
|
So if you want people to be relatively able to easily read off a decimal number
|
|||
|
from its binary representation, this makes a lot more sense. I think today BCD
|
|||
|
is obsolete because we have displays and our computers can convert numbers
|
|||
|
represented in binary to decimal for us and display them.
|
|||
|
|
|||
|
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from – in
|
|||
|
the context of BCD, you end up referring to half bytes a lot (because every
|
|||
|
digits is 4 bits). So it makes sense to have a word for “4 bits”, and people
|
|||
|
called 4 bits a nibble. Today “nibble” feels to me like an archaic term though –
|
|||
|
I’ve definitely never used it except as a fun fact (it’s such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
|
|||
|
|
|||
|
> The nibble is used to describe the amount of memory used to store a digit of
|
|||
|
> a number stored in packed decimal format (BCD) within an IBM mainframe.
|
|||
|
|
|||
|
Another reason someone mentioned for BCD was **financial calculations**. Today
|
|||
|
if you want to store a dollar amount, you’ll typically just use an integer
|
|||
|
amount of cents, and then divide by 100 if you want the dollar part. This is no
|
|||
|
big deal, division is fast. But apparently in the 70s dividing an integer
|
|||
|
represented in binary by 100 was very slow, so it was worth it to redesign how
|
|||
|
you represent your integers to avoid having to divide by 100.
|
|||
|
|
|||
|
Okay, enough about BCD.
|
|||
|
|
|||
|
#### reason 3: 8 is a power of 2?
|
|||
|
|
|||
|
A bunch of people said it’s important for a CPU’s byte size to be a power of 2.
|
|||
|
I can’t figure out whether this is true or not though, and I wasn’t satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper.
|
|||
|
And historically there have definitely been lots of machines that used byte sizes that weren’t powers of 2, for example (from [this retro computing stack exchange thread][15]):
|
|||
|
|
|||
|
- Cyber 180 mainframes used 6-bit bytes
|
|||
|
- the Univac 1100 / 2200 series used a 36-bit word size
|
|||
|
- the PDP-8 was a 12-bit machine
|
|||
|
|
|||
|
Some reasons I heard for why powers of 2 are good that I haven’t understood yet:
|
|||
|
|
|||
|
- every bit in a word needs a bus, and you want the number of buses to be a power of 2 (why?)
|
|||
|
- a lot of circuit logic is susceptible to divide-and-conquer techniques (I think I need an example to understand this)
|
|||
|
|
|||
|
Reasons that made more sense to me:
|
|||
|
|
|||
|
- it makes it easier to design **clock dividers** that can measure “8 bits were
|
|||
|
sent on this wire” that work based on halving – you can put 3 halving clock
|
|||
|
dividers in series. [Graham Sutherland][16] told me about this and made this really cool
|
|||
|
[simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
|
|||
|
- if you have an instruction that zeroes out a specific bit in a byte, then if
|
|||
|
your byte size is 8 (2^3), you can use just 3 bits of your instruction to
|
|||
|
indicate which bit. x86 doesn’t seem to do this, but the [Z80’s bit testing instructions][18] do.
|
|||
|
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work
|
|||
|
in groups of 4 bits. From some quick Googling it seems like there are a wide
|
|||
|
variety of adder circuits out there though.
|
|||
|
- **bitmaps**: Your computer’s memory is organized into pages (usually of size 2^n). It
|
|||
|
needs to keep track of whether every page is free or not. Operating systems
|
|||
|
use a bitmap to do this, where each bit corresponds to a page and is 0 or 1
|
|||
|
depending on whether the page is free. If you had a 9-bit byte, you would
|
|||
|
need to divide by 9 to find the page you’re looking for in the bitmap.
|
|||
|
Dividing by 9 is slower than dividing by 8, because dividing by powers of 2
|
|||
|
is always the fastest thing.
|
|||
|
|
|||
|
I probably mangled some of those explanations pretty badly: I’m pretty far out
|
|||
|
of my comfort zone here. Let’s move on.
|
|||
|
|
|||
|
#### reason 4: small byte sizes are good
|
|||
|
|
|||
|
You might be wondering – well, if 8-bit bytes were better than 4-bit bytes,
|
|||
|
why not keep increasing the byte size? We could have 16-bit bytes!
|
|||
|
|
|||
|
A couple of reasons to keep byte sizes small:
|
|||
|
|
|||
|
- It’s a waste of space – a byte is the minimum unit you can address, and if
|
|||
|
your computer is storing a lot of ASCII text (which only needs 7 bits), it
|
|||
|
would be a pretty big waste to dedicate 12 or 16 bits to each character when
|
|||
|
you could use 8 bits instead.
|
|||
|
- As bytes get bigger, your CPU needs to get more complex. For example you need one bus line per bit. So I guess simpler is better.
|
|||
|
|
|||
|
My understanding of CPU architecture is extremely shaky so I’ll leave it at
|
|||
|
that. The “it’s a waste of space” reason feels pretty compelling to me though.
|
|||
|
|
|||
|
#### reason 5: compatibility
|
|||
|
|
|||
|
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the
|
|||
|
8086 (from 1976) – the first x86 processor. It seems like the 8080 and the
|
|||
|
8086 were really popular and that’s where we get our modern x86 computers.
|
|||
|
|
|||
|
I think there’s an “if it ain’t broke don’t fix it” thing going on here – I
|
|||
|
assume that 8-bit bytes were working well, so Intel saw no need to change the
|
|||
|
design. If you keep the same 8-bit byte, then you can reuse more of your
|
|||
|
instruction set.
|
|||
|
|
|||
|
Also around the 80s we start getting network protocols like TCP
|
|||
|
which use 8-bit bytes (usually called “octets”), and if you’re going to be
|
|||
|
implementing network protocols, you probably want to be using an 8-bit byte.
|
|||
|
|
|||
|
#### that’s all!
|
|||
|
|
|||
|
It seems to me like the main reasons for the 8-bit byte are:
|
|||
|
|
|||
|
- a lot of early computer companies were American, the most commonly used language in the US is English
|
|||
|
- those people wanted computers to be good at text processing
|
|||
|
- smaller byte sizes are in general better
|
|||
|
- 7 bits is the smallest size you can fit all English characters + punctuation in
|
|||
|
- 8 is a better number than 7 (because it’s a power of 2)
|
|||
|
- once you have popular 8-bit computers that are working well, you want to keep the same design for compatibility
|
|||
|
|
|||
|
Someone pointed out that [page 65 of this book from 1962][20]
|
|||
|
talking about IBM’s reasons to choose an 8-bit byte basically says the same thing:
|
|||
|
|
|||
|
- Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
|
|||
|
- Within the limits of this capacity, a single character is represented by a
|
|||
|
single byte, so that the length of any particular record is not dependent on
|
|||
|
the coincidence of characters in that record.
|
|||
|
- 8-bit bytes are reasonably economical of storage space
|
|||
|
- For purely numerical work, a decimal digit can be represented by only 4
|
|||
|
bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such
|
|||
|
packing of numerical data is not essential, it is a common practice in
|
|||
|
order to increase speed and storage efficiency. Strictly speaking, 4-bit
|
|||
|
bytes belong to a different code, but the simplicity of the 4-and-8-bit
|
|||
|
scheme, as compared with a combination 4-and-6-bit scheme, for example,
|
|||
|
leads to simpler machine design and cleaner addressing logic.
|
|||
|
- Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer
|
|||
|
to take advantage of powerful features of binary addressing and indexing to
|
|||
|
the bit level (see Chaps. 4 and 5 ) .
|
|||
|
|
|||
|
>
|
|||
|
|
|||
|
Overall this makes me feel like an 8-bit byte is a pretty natural choice if
|
|||
|
you’re designing a binary computer in an English-speaking country.
|
|||
|
|
|||
|
--------------------------------------------------------------------------------
|
|||
|
|
|||
|
via: https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes/
|
|||
|
|
|||
|
作者:[Julia Evans][a]
|
|||
|
选题:[lkxed][b]
|
|||
|
译者:[译者ID](https://github.com/译者ID)
|
|||
|
校对:[校对者ID](https://github.com/校对者ID)
|
|||
|
|
|||
|
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
|||
|
|
|||
|
[a]: https://jvns.ca/
|
|||
|
[b]: https://github.com/lkxed/
|
|||
|
[1]: https://social.jvns.ca/@b0rk/109976810279702728
|
|||
|
[2]: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
|
|||
|
[3]: https://en.wikipedia.org/wiki/IBM_System/360
|
|||
|
[4]: https://www.youtube.com/watch?v=9oOCrAePJMs&t=140s
|
|||
|
[5]: https://en.wikipedia.org/wiki/EBCDIC
|
|||
|
[6]: https://en.wikipedia.org/wiki/Intel_8008
|
|||
|
[7]: https://archive.computerhistory.org/resources/text/2009/102683240.05.02.acc.pdf
|
|||
|
[8]: https://archive.computerhistory.org/resources/access/text/2013/05/102702492-05-01-acc.pdf
|
|||
|
[9]: https://en.wikipedia.org/wiki/36-bit_computing
|
|||
|
[10]: https://en.wikipedia.org/wiki/Binary-coded_decimal
|
|||
|
[11]: https://commons.wikimedia.org/wiki/File:IBM-650-panel.jpg
|
|||
|
[12]: http://creativecommons.org/licenses/by-sa/3.0/
|
|||
|
[13]: https://upload.wikimedia.org/wikipedia/commons/a/ad/IBM-650-panel.jpg
|
|||
|
[14]: https://en.wikipedia.org/wiki/Nibble
|
|||
|
[15]: https://retrocomputing.stackexchange.com/questions/7937/last-computer-not-to-use-octets-8-bit-bytes
|
|||
|
[16]: https://poly.nomial.co.uk/
|
|||
|
[17]: https://www.falstad.com/circuit/circuitjs.html?ctz=CQAgjCAMB0l3BWcMBMcUHYMGZIA4UA2ATmIxAUgpABZsKBTAWjDACgwEknsUQ08tQQKgU2AdxA8+I6eAyEoEqb3mK8VMAqWSNakHsx9Iywxj6Ea-c0oBKUy-xpUWYGc-D9kcftCQo-URgEZRQERSMnKkiTSTDFLQjw62NlMBorRP5krNjwDP58fMztE04kdKsRFBQqoqoQyUcRVhl6tLdCwVaonXBO2s0Cwb6UPGEPXmiPPLHhIrne2Y9q8a6lcpAp9edo+r7tkW3c5WPtOj4TyQv9G5jlO5saMAibPOeIoppm9oAPEEU2C0-EBaFoThAAHoUGx-mA8FYgfNESgIFUrNDYVtCBBttg8LiUPR0VCYWhyD0Wp0slYACIASQAamTIORFqtuucQAzGTQ2OTaD9BN8Soo6Uy8PzWQ46oImI4aSB6QA5ZTy9EuVQjPLq3q6kQmAD21Beome0qQMHgkDIhHCYVEfCQ9BVbGNRHAiio5vIltg8Ft9stXg99B5MPdFK9tDAFqg-rggcIDui1i23KZfPd3WjPuoVoDCiDjv4gjDErYQA
|
|||
|
[18]: http://www.chebucto.ns.ca/~af380/z-80-h.htm
|
|||
|
[19]: https://en.wikipedia.org/wiki/Carry-lookahead_adder
|
|||
|
[20]: https://web.archive.org/web/20170403014651/http://archive.computerhistory.org/resources/text/IBM/Stretch/pdfs/Buchholz_102636426.pdf
|