Two takes on multiplication

Two takes on multiplication

Apparently, people actually read this blog (who knew!), so why not write about a topic I find interesting: multiplication. Multiplying two values together (as opposed to multiplying one value with a constant) is actually a difficult computational task. This was difficult to do in mechanical analog computers (shout out to any potential future mechanical engineers reading this blog), is difficult to do in electrical analog computers, and is still relatively non-trivial to do with stored program digital computers.

Take a look at the front page of the datasheet for the ATmega328P microcontroller (link goes to a PDF). It brags about its two cycle multiplier!

A screenshot of a single section of the ATmega328P datasheet. It states "Advanced RISC architecture ... On-chip 2-cycle multiplier."

When you multiply two values together on a computer (or what most people think of when they hear the word “computer”) or using a microcontroller, there is physical hardware somewhere that is used in computing the result. In floating-point arithmetic, multiplication may be a relatively complicated set of instructions (many, many lines of software which use lots of different pieces of digital hardware, one or more of which may be a multiplier). In fixed-point integer arithmetic, this can be carried out in one or few instructions but may require multiplication hardware of one form or another. As much as I’d like to get into a long digression about the tradeoff between software and hardware, maybe that is best saved for a different post.

Note that this post only applies to unsigned multiplication. Signed multiplication is more complicated and opens up several cans of worms that prefer to remain closed. The worms thank you. If you’d like to learn more, you should take my digital systems class! Or read about it in the textbook I wrote (link goes to a PDF).

Multiplier the first: using the algorithm

When you learned how to multiply integers (very likely using the base 10 number system), you may have learned the same algorithm I did. You write the two numbers down and right-justify them. The top number is called the multiplier, and the bottom number is called the multiplicand. (I hate definitions too, but I promise that being able to distinguish between these two values is helpful.) You start with the two digits on the right (the ones place of the multiplier and the ones place of the multiplicand) and multiply them together (this is why we learned the times table, so we could write down that result). Then you take the right-most (ones place) digit on the multiplicand and multiply it with the next (tens place) digit on the multiplier and compute that result. You repeat this process until you’ve multiplied the ones place digit in the multiplicand by every digit in the multiplier. (Note that there may be some addition steps that occur due to carries, which I’m going to gloss over as it’s irrelevant to binary multiplication.)

Then you create a second line in your result, put a spacer in to indicate that you’ve increased by a power of ten, then go to the tens place digit in the multiplicand and repeat the above process (multiplying the tens place digit in the multiplicand by every digit in the multiplier) using the times tables you memorized. Then you go to the hundreds place digit in the multiplicand, and then the thousands place digit, and so on, until you’ve run out of digits in the multiplicand.

Are you done? Of course not! Now you must add each line of your result together. Finally, you have completed your arithmetic task. Great job!

The image below shows my attempt at doing multiplication by hand. My third grade teacher should be proud. However, this was not exactly fun, and I definitely checked my work afterward using a calculator. (You may ignore the carries, as they are irrelevant to binary multiplication, and I don’t want to waste any more time on base 10 arithmetic.)

By-hand multiplication of 378 x 512.

In binary, there are only two numbers: zero and one. This vastly simplifies the times tables. Your results are either zero or one and the result is simply the AND operation. However, the process of multiplication is the same. We use the same algorithm as above. We can use Boolean algebra and logical steps to define this algorithm:

  1. AND the LSB in the multiplicand with each digit in the multiplier, store the result in a register.
  2. Shift the multiplicand right one bit so the twos place digit is now the LSB.
  3. Shift the multiplicand left one bit so the ones place digit is now the twos place digit. (This causes an issue that will be explained below.)
  4. AND the new LSB digit in the multiplicand with each digit in the multiplier.
  5. ADD the previous sum to the new sum (from step 4).
  6. Repeat steps 2-5 until you run out of digits in the multiplicand.

The algorithm is great: all you need to do is shift, AND, and ADD. There are AND gates, adders, and shift registers, so the hardware to do all of these things already exists! However (of course there’s a however) there are a couple of downsides to this algorithm.

Notably, the multiplier needs to be shifted left in order to increase the significance of the result by a power of two each time we do this. This was equivalent in the example above to putting in the spacer when we moved to the tens place in the multiplicand, and two spacers when we moved to the hundreds place in the multiplicand, etc. This means that if we are multiplying two n-bit values together, we need to have a register that has 2n bits to accommodate this shifting. Not only does the multiplier need to be stored in a 2n-bit register, but the result also needs to be stored in a 2n-bit register to prevent any overflow errors.

The second downside is that if we are multiplying two n-bit values together, it will take n clock cycles to complete the arithmetic operation. (Actually, this becomes an upside once you learn about the second multiplier. It’s a downside compared to a strict combinational logic approach, which would have this accomplished without a clock, but we’d pay for it dearly in breadboard rats-nests.)

The third downside is that there will be at least three registers being used to do this (multiplier, multiplicand, result), and each one requires control signals dictating when to shift, when to hold, and when to load. This means our design will require a finite-state machine. The control logic will be non-trivial. We’ll need a bunch of different states, and this could become unwieldy if we want to multiply two values together where n is large.

In fact, I got clever and realized that the storage register doesn’t need to shift, and it’s possible to hold it by inhibiting the clock, so I used two quad D flip-flop chips (74175) to store the result. This saved some money, as the 74175 is less than half the price of a 4-bit bidirectional shift register (74194). (They do have the same footprint, so no space was saved.)

My design has n = 4, and that n value is fixed. This design is not scalable. I would have to completely redesign the finite-state machine if I wanted to expand this to multiply, say, two 8-bit unsigned numbers together. It also multiplies only two digits together. You cannot then accumulate a result by multiplying another value to the result, and so on, like you could in a calculator. (This later caveat also applies to the second multiplication approach; it cannot accumulate a running result.)

The finite-state machine has six states, shown in the state diagram below. This is necessary to allow for three shift operations. I used a single quad D flip-flop chip to reduce the footprint of using two dual flip-flop chips. The tradeoff is that there is no asynchronous preset pin on the quad chip. If I were to redesign this today, I would probably try to see if I could get clever enough with asynchronous pins to reduce the number of states. I know I could reduce this design down to five states if I built this today. I’m not sure about four, which would enable me to use a single dual chip. As with all things, I’m sure that reducing complexity in one area would increase it elsewhere.

In the state diagram, the synchronous outputs are in the form WXYZ. W and X are the mode select signals S1 and S0, which tell the shift registers to shift left, shift right, load, or hold. Y is the signal sent to the active-LOW clear pins on the result flip-flops. Z is an enable signal that is used to inhibit the D flip-flops acting as the result storage register when the operation is complete.

The next-state logic was relatively complicated and required a fair amount of logic. This is one reason why this design required a few more chips than the second multiplier. It’s just a bulky circuit.

An image of the state diagram for the multiplier finite-state machine. It is described in the text.

The load state has a state assignment of 000 (out of character for me). I used an active-LOW pushbutton to clear out the finite-state machine flip-flops (and get into that load state). This signal also cleared out the registers holding the multiplicand and multiplier. Because this is the “reset” state (it will occur as soon as you power the circuit), it means that there will immediately be a multiplication operation upon resetting the power signal to the circuit, regardless of whether or not you wanted to multiply. As there is no clear or resetting pushbutton, that means the only way to clear out a value is to multiply two different numbers together.

The three shift states simply execute the algorithm of shifting the multiplicand and multipliers. The AND operation occurs with combinational logic, so that’s baked into the circuit design and doesn’t require a separate state in the finite-state machine (thank goodness).

The hold state causes the multiplier and multiplicand registers to both hold, whereas the result flip-flops need an additional update to store the final computed sum. In the done state, everything goes into idle mode (registers hold, the result flip-flops are clock inhibited).

The clock frequency I’m using on this design is a whopping 68 Hz. I don’t have any notes in my design notebook about trying out different frequencies, so it’s possible I just used something fast enough that I couldn’t see it calculating, slow enough that it worked without having to clean up the timer signal, and then just used that value. (I’ve learned a lot in the past 5 years.)

This PCB is 3.85″ x 3.79″, resulting in an area of 14.6 in2. Note that I used a calculator to compute this value. (Please don’t report me to the authorities for not using SI units. I apologize to any other purists who may be reading this.)

A photograph of the completed circuit is shown below, demonstrating 4×8 (left) and 11×8 (right). The displays (which I’ll chat more about at the conclusion of this post) are displaying the correct results in both cases, but you may have to either squint or just take my word for it.

Two side-by-side photographs of the first multiplier circuit. On the left, it is displaying 4x8. On the right, it is displaying 11x8.

Multiplier the second: using a counter

As I was sitting in my office one day, Cole S., a digital systems student, came in and said something along the lines of “Dr. P, if multiplication is just adding X to itself Y times, why can’t we just use a counter and add instead of doing shift and AND?” Cole is correct, when you multiply X by Y, you can obtain a result by adding X to itself Y times. Thanks to Cole’s comment, in a moment of boredom about a year later, I decided to take this idea to the breadboard and give it a go.

This approach does not use an algorithm (per se) but instead uses a counter chip and clever use of control signals. Here’s how this works:

  1. Load the multiplicand into a down counter, and load the multiplier into a register.
  2. Each clock cycle, if the counter is NOT zero:
    1. Allow the result register to accumulate the new value by loading the old sum plus the multiplier.
    2. Decrement the counter.

That’s it! The algorithm is relatively straightforward, and can be scaled up if you increase n. What’s the problem? The problem is that while the previous multiplier required n clock cycles to compute a result, this requires 2n-1 clock cycles to compute a result. So while this may be OK for computing small numbers, it is not appropriate for anything that needs to be scaled up.

My design again is based on a 4-bit unsigned architecture. The clever part of this approach is that the 74191 is the perfect counter to use as it can be asynchronously parallel loaded, can count down, and asserts a MIN/MAX signal when it has reached its largest (DCBA = 1111) value when up-counting or smallest (DCBA = 0000) value when down-counting. Because I don’t know how large the multiplicand is, I can load the multiplicand and then decrement to zero, and use the MIN/MAX signal as a control to indicate that the output is zero. Otherwise, if I up-counted, I’d need some type of comparator to compare the result on the output pins with the multiplicand, and that would require extra hardware. The 74191 chip also has an active-LOW CTEN pin that can act as a clock inhibit, which is very useful in stopping the counter once we’re done decrementing.

Other cleverness includes using a state assignment of 11 for the load state (step 1 in the list above) to use asynchronous preset on the flip-flop chip. (Naturally, the reset state assignment is 00.)

While there is a finite-state machine required to deal with control logic, it is much simpler than that used in the first multiplier. There are three states (and would continue to be three states if I wanted to use a larger counter and wider registers), and the logic for the finite-state machine was very simple. The JK flip-flop won out with the simplest next-state equations which required zero additional logic gates. To avoid setup issues, a falling-edge triggered clock is necessary. I used a 555 timer with a frequency of approximately 48 kHz and cleaned it up with a Schmitt inverter.

Another nice aspect of this design is that the multiplier does not need to be stored in a 2n-bit register because it never needs to be shifted. However, the result does need to be stored into a 2n-bit register if you want to avoid overflow errors. The multiplicand does not need to be stored in a register at all, as the storage occurs in the counter, which then decrements each clock cycle. In addition, not only are there fewer registers than in the previous design, each one (multiplier and result) shares the same mode select bits S1 and S0. The mode select bits tell the 74194 if it should shift left, shift right, hold, or load. Furthermore, each of the individual mode select bits are equal (that is, S1 = S0) because the registers will only load or hold. This simplifies the logic greatly.

You can see this in action on a breadboard where the multiplier is 3 and the multiplicand is 4 in the video below. Note that 3 is added to itself 4 times. The clock is quite slow so you can see the result accumulate as the counter decrements. The result is shown on a bar LED (not decoded into decimal). There are some diagnostic LEDs: the blue LED is the clock signal, the red LEDs are the output value (DCBA) of the counter, and the yellow LED is the signal asserted on the MIN/MAX pin of the counter chip. There are also some yellow LEDs off to the left-hand side of the video (I think they go out of frame at some point) which show the current values on each flip-flop. (I apologize for the video quality, as it is completely un-edited. The sound is just ambient noise, so I recommend playing it on mute or with the volume down.)

The finite-state machine, as promised, is relatively straightforward. There are three states. The state diagram is shown below. The synchronous outputs are in order of XYZ, where X is the value of each mode select bit on each register, Y is the value sent to the LOAD pin on the counter, and Z is the value sent to the CTEN (clock enable) pin on the counter.

The load state is entered into when an active-LOW pushbutton is pressed. This asserts the active-LOW preset pins on the JK flip-flops and gets the finite-state machine into state 11. The mode select bits on each register are all 0 (hold), the active-LOW counter load pin is asserted, and the active-LOW clock enable pin is asserted, allowing the clock to begin decrementing the counter. If the MIN pin on the counter is asserted, the multiplication is done (multiplicand = 0, we’re done multiplying), the state machine goes to reset. Otherwise, we’re not done decrementing and it’s time to add.

In the add state, the registers will synchronously load (mode select bits will all be 1), the counter will not load, but the clock input to the counter chip is enabled. We will remain in this state as long as the MIN pin of the counter is not asserted. Once the MIN pin asserts (we’ve decremented a number of times equal to the multiplicand), we go into the reset state.

The reset state just chills out and does nothing. The registers hold. The counter does not load. The counter clock enable is de-asserted, so the value on the counter will remain at zero. It is not possible to leave this state synchronously. There is also a clear/reset pushbutton on the circuit that will asynchronously enter into this state (that pushbutton also clears the contents of each register).

This PCB is 3.44″ x 3.70″, which is an area of 12.7 in2. This is approximately 87% of the area of the first multiplier, so while there is a smaller footprint, it’s not a huge difference. I was slightly disappointed in my inability to squeeze this down any smaller. Both PCBs were space-optimized as much as possible.

A photograph of the completed circuit is shown below, demonstrating 4×8 (left) and 11×8 (right). Thankfully, these displays are much easier to read (although possibly slightly too bright in the photos I took).

Two side-by-side photographs of the second multiplier circuit. On the left, it is displaying 4x8. On the right, it is displaying 11x8.

Comparison and conclusion (finally)

Here is a rough comparison between the two designs, in the same format as I used when looking at digital project design strategies.

multiplier 1multiplier 2
power consumption0.86 W when displaying 0x0
1.66 W when displaying 15×15
1.15 W when displaying 0x0
1.38 W when displaying 15×15
size14.6 in212.7 in2
cost$116$106
funthis one frustrated me, but it was one of my first big designs, so I’ll cut it a little bit of slack and give this an arbitrary rating of 7/10this was a fun challenge and I enjoyed trying to be as clever as possible with the hardware to streamline the design, so I’ll give this an 9/10
otherFSM is not scalable

While the size of the two PCBs is roughly similar, I would (if making it today) make some changes to multiplier 1 (using the multiplication algorithm). To save space, I’d use a bussed resistor for pull-downs. If I only made this change, I could probably get the two boards down to the same area. However, I’d also use a 7414 to clean up the 555 timer. Because there’s no need for an inverter chip on this design, that would add a chip, causing the footprint to remain roughly the same as it is now.

Both of the multipliers require a fair amount of real-estate for the BCD decoding… if only people could read binary as easily as base 10, that would not be necessary. But alas, we must design our hardware to be usable by the masses, and not only digital systems students.

I calculated prices using Jameco and Sparkfun’s prices for single parts as of July 12, 2024. I also used the price per square-inch that Oshpark charges today. The first multiplier, being 5 years old, was likely much cheaper when I ordered it. In both cases, the PCB is roughly 2/3 of the cost of the entire design. The 7448 chip is also relatively high-priced, almost 2.5x more expensive than the 7447, but saves me from having to use 21 current-limiting resistors which would be necessary if I used the 7447 chip.

As far as power consumption goes, the two designs have a couple notable differences. Multiplier 1 has 220 Ω pull-downs whereas multiplier 2 (using the counter) has 330 Ω pull-downs. It’s not a huge difference, but would explain the difference in power consumption when multiplying 15×15 when all switches on the DIP switch are ON.

The two segmented displays are different. I tried to find datasheets on both displays, but can’t find the datasheet for the display used on multiplier 2. Qualitatively, to my eyes, the display on multiplier 1 is less bright than the display on multiplier 2. However the numerals are larger on multiplier 1. Otherwise, the displays are both common-cathode and use the 7448 decoder, so the decoding circuitry is identical between circuits. I imagine some difference in power consumption must be due to the different displays. I have a photo of both multipliers which I think very poignantly illuminates (pun intended) the difference in displays. Both are showing the 11×8 operation. Multiplier 2 is on the left, and multiplier 1 is on the right. The contrast on the displays on multiplier 1 is so poor as to be nearly unreadable.

A photograph of multiplier 2 (left) and multiplier 1 (right) both displaying the result of 11 times 8.

Note that I did figure out how to do leading zero suppression using the 7448 chip between the two designs. I think it makes the output even more readable in the case of multiplier 2. It was extremely simple to do and I will use it in future designs. I was surprised that it didn’t lead to a huge power reduction, which makes me think that the displays themselves didn’t contribute a lot to the power budget.

I measured power consumption during static operation, after the finite-state machine had concluded its multiplication and was in its reset state. However, in both circuits, a 555 timer is chugging away even when the circuit is idling. This may not be the most efficient use of power, and it is possible to inhibit a 555 timer, which I didn’t bother to do in either design. If I learned anything in my undergrad classes, it’s that power consumption increases with frequency, so I imagine the higher than expected power consumption of multiplier 2 is due to its much faster operating frequency. This will make me think about possible 555 timer inhibit strategies to help mitigate that in future designs.

The fun scale is totally arbitrary. Is it linear? Non-linear? What type of project would be a zero? What would be a ten? Folks, there are many unanswered questions in the universe. These will have to be added to the list.

Building both of these multipliers was really fun (well, apparently they were 70% and 90% fun, respectively), and really solidified my comfort with the process of binary multiplication. I know that I teach the topic, but building something always makes it more real to me. Feel free to try it out yourself!