I’ve been working on a zine about how computers represent thing in binary, and one question I’ve gotten a few times is – why does the x86 architecture use 8-bit bytes? Why not some other size?
I’m not super into computer history (I like to use computers a lot more than I like reading about them), but I am always curious if there’s an essential reason for why a computer thing is the way it is today, or whether it’s mostly a historical accident. So we’re going to talk about some computer history.
As an example of a historical accident: DNS has a `class` field which has 5 possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me that’s a clear example of a historical accident – I can’t imagine that we’d define the class field the same way if we could redesign DNS today without worrying about backwards compatibility. I’m not sure if we’d use a class field at all!
There aren’t any definitive answers in this post, but I asked [on Mastodon][1] and here are some potential reasons I found for the 8-bit byte. I think the answer is some combination of these reasons.
#### what’s the difference between a byte and a word?
First, this post talks about “bytes” and “words” a lot. What’s the difference between a byte and a word? My understanding is:
- the **byte size** is the smallest unit you can address. For example in a program on my machine `0x20aa87c68` might be the address of one byte, then `0x20aa87c69` is the address of the next byte.
- The **word size** is some multiple of the byte size. I’ve been confused about this for years, and the Wikipedia definition is incredibly vague (“a word is the natural unit of data used by a particular processor design”). I originally thought that the word size was the same as your register size (64 bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2], on x86 a word is 16 bits even though the registers are 64 bits. So I’m confused – is a word on x86 16 bits or 64 bits? Can it mean both, depending on the context? What’s the deal?
Now let’s talk about some possible reasons that we use 8-bit bytes!
#### reason 1: to fit the English alphabet in 1 byte
[This Wikipedia article][3] says that the IBM System/360 introduced the 8-bit byte in 1964.
Here’s a [video interview with Fred Brooks (who managed the project)][4] talking about why. I’ve transcribed some of it here:
> … the six bit bytes [are] really better for scientific computing and the 8-bit byte ones are really better for commercial computing and each one can be made to work for the other.
> So it came down to an executive decision and I decided for the 8-bit byte, Jerry’s proposal.
>
> ...
>
> My most important technical decision in my IBM career was to go with the 8-bit byte for the 360.
> And on the basis of I believe character processing was going to become important as opposed to decimal digits.
It makes sense that an 8-bit byte would be better for text processing: 2^6 is 64, so 6 bits wouldn’t be enough for lowercase letters, uppercase letters, and symbols.
It looks like the next important machine in 8-bit-byte history was the [Intel 8008][6], which was built to be used in a computer terminal (the Datapoint 2200). Terminals need to be able to represent letters as well as terminal control codes, so it makes sense for them to use an 8-bit byte. [This Datapoint 2200 manual from the Computer History Museum][7] says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
#### why was the 6-bit byte better for scientific computing?
I was curious about this comment that the 6-bit byte would be better for scientific computing. Here’s a quote from [this interview from Gene Amdahl][8]:
> I wanted to make it 24 and 48 instead of 32 and 64, on the basis that this
> would have given me a more rational floating point system, because in floating
> point, with the 32-bit word, you had to keep the exponent to just 8 bits for
> exponent sign, and to make that reasonable in terms of numeric range it could
> span, you had to adjust by 4 bits instead of by a single bit. And so it caused
> you to lose some of the information more rapidly than you would with binary
I don’t understand this comment at all – why does the exponent have to be 8 bits if you use a 32-bit word size? Why couldn’t you use 9 bits or 10 bits if you wanted? But it’s all I could find in a quick search.
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out that there’s a great explanation in the Wikipedia article on [36-bit computing][9]:
Today computers are way faster and cheaper, so if you want to represent ten decimal digits for some reason you can just use 64 bits – wasting a little bit of space is usually no big deal.
Someone else mentioned that some of these machines with 36-bit word sizes let you choose a byte size – you could use 5 or 6 or 7 or 8-bit bytes, depending on the context.
This integer representation seemed really weird to me – why not just use binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
My best guess about why is that early computers didn’t have displays the same way we do now, so the contents of a byte were mapped directly to on/off lights.
So if you want people to be relatively able to easily read off a decimal number from its binary representation, this makes a lot more sense. I think today BCD is obsolete because we have displays and our computers can convert numbers represented in binary to decimal for us and display them.
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from – in the context of BCD, you end up referring to half bytes a lot (because every digits is 4 bits). So it makes sense to have a word for “4 bits”, and people called 4 bits a nibble. Today “nibble” feels to me like an archaic term though – I’ve definitely never used it except as a fun fact (it’s such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
Another reason someone mentioned for BCD was **financial calculations**. Today if you want to store a dollar amount, you’ll typically just use an integer amount of cents, and then divide by 100 if you want the dollar part. This is no big deal, division is fast. But apparently in the 70s dividing an integer represented in binary by 100 was very slow, so it was worth it to redesign how you represent your integers to avoid having to divide by 100.
A bunch of people said it’s important for a CPU’s byte size to be a power of 2. I can’t figure out whether this is true or not though, and I wasn’t satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper. And historically there have definitely been lots of machines that used byte sizes that weren’t powers of 2, for example (from [this retro computing stack exchange thread][15]):
- it makes it easier to design **clock dividers** that can measure “8 bits were sent on this wire” that work based on halving – you can put 3 halving clock dividers in series. [Graham Sutherland][16] told me about this and made this really cool [simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
- if you have an instruction that zeroes out a specific bit in a byte, then if your byte size is 8 (2^3), you can use just 3 bits of your instruction to indicate which bit. x86 doesn’t seem to do this, but the [Z80’s bit testing instructions][18] do.
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work in groups of 4 bits. From some quick Googling it seems like there are a wide variety of adder circuits out there though.
- **bitmaps**: Your computer’s memory is organized into pages (usually of size 2^n). It needs to keep track of whether every page is free or not. Operating systems use a bitmap to do this, where each bit corresponds to a page and is 0 or 1 depending on whether the page is free. If you had a 9-bit byte, you would need to divide by 9 to find the page you’re looking for in the bitmap. Dividing by 9 is slower than dividing by 8, because dividing by powers of 2 is always the fastest thing.
I probably mangled some of those explanations pretty badly: I’m pretty far out of my comfort zone here. Let’s move on.
- It’s a waste of space – a byte is the minimum unit you can address, and if your computer is storing a lot of ASCII text (which only needs 7 bits), it would be a pretty big waste to dedicate 12 or 16 bits to each character when you could use 8 bits instead.
My understanding of CPU architecture is extremely shaky so I’ll leave it at that. The “it’s a waste of space” reason feels pretty compelling to me though.
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the 8086 (from 1976) – the first x86 processor. It seems like the 8080 and the 8086 were really popular and that’s where we get our modern x86 computers.
I think there’s an “if it ain’t broke don’t fix it” thing going on here – I assume that 8-bit bytes were working well, so Intel saw no need to change the design. If you keep the same 8-bit byte, then you can reuse more of your instruction set.
Also around the 80s we start getting network protocols like TCP which use 8-bit bytes (usually called “octets”), and if you’re going to be implementing network protocols, you probably want to be using an 8-bit byte.
> 1. Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
> 2. Within the limits of this capacity, a single character is represented by a single byte, so that the length of any particular record is not dependent on the coincidence of characters in that record.
> 3. 8-bit bytes are reasonably economical of storage space
> 4. For purely numerical work, a decimal digit can be represented by only 4 bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such packing of numerical data is not essential, it is a common practice in order to increase speed and storage efficiency. Strictly speaking, 4-bit bytes belong to a different code, but the simplicity of the 4-and-8-bit scheme, as compared with a combination 4-and-6-bit scheme, for example, leads to simpler machine design and cleaner addressing logic.
> 5. Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer to take advantage of powerful features of binary addressing and indexing to the bit level (see Chaps. 4 and 5 ) .