# A Quad-Output Elastic Switched Capacitor Converter and Per-Core LDO with 87% Power Efficiency and 2.5x Core-Frequency Range Improvement

Samantak Gangopadhyay\*, James W. Tschanz\*\*, and Arijit Raychowdhury\*

\* School of ECE, Georgia Institute of Technology, GA, USA, \*\* Intel Corporation, Hillsboro, OR, USA

Abstract—A quad-output elastic switched capacitor converter with four cores and per-core digital low dropout regulators (LDOs) is designed in 130nm CMOS. This design routes power on demand by sharing the total switching capacitance network across all the cores and delivering power to each core in a time interleaved manner. As the current demand of a core increases, more switching capacitance and switch area resources are automatically allotted to the core. In case of further power demand, if the power delivery module can no longer allocate further resources, then it autonomously changes the voltage conversion ratio till the demand is met. Measurements reveal 87% peak power efficiency and 2.5x increase in core-frequency range, thus enabling wider dynamic voltage and frequency scaling (DVFS).

Keywords—Switched Capacitor Network, Multi-Ratio Switched Capacitor, Flexible SC, Phase locked LDOs, DVFS, CMOS

## I. INTRODUCTION

With an increasing number of power domains, fine-grain per-core DVFS and decreasing decoupling capacitance per domain, power delivery and management in digital SoCs continue to pose serious challenges. Switched capacitors (SC) have gained popularity due to their ability to provide high efficiency and ease of on-chip integration [1]. However, the SC converters provide high efficiency only within a limited input and output range as they are designed and optimized for discrete conversion ratios. Multi-ratio switched capacitor (SC) DC-DC converters provide high energy efficiency for multiple conversion ratios and therefore can enhance this range [2–6]. The power delivery system in [5] provides multiple ratios through cascading several SC DC-DC units each with a conversion ratio of ½. Unfortunately, cascading losses and output capacitor after each stage can make the design inefficient. [2] uses a common integrated controller to produce three different voltage levels as SC outputs, however, all the three outputs are produced from three different switched capacitors converter units. In [3] a reconfigurable dual output SC regulator has been provided but the two outputs remain fixed to a 2X and a 3X conversion ratio. While multi-ratio SCs enhance the range of high-efficiency conversion, unfortunately they suffer from limited energy density due to low on-die capacitance density. To address this [7, 8] introduce the idea of dynamic resource allocation. In [7] an integrated dual-output SC converter with dynamic power-cell allocation is presented. For optimal power efficiency, the design redistributes capacitance resources until both the SC channels operate at the same switching frequency. Although its a single ratio SC converter and such system the distribution logic can become significantly more complex for three or more cores, it provides a foundation towards the philosophy of flexible resource allocation based on load requirement and enhancing the overall power efficiency.

In this paper, present a quad-output elastic SC (QOESC) converter with per-core LDO (Fig. 1) that provides regulated voltage supply to 4 cores. As opposed to a baseline design



Fig. 1. Detailed top-level structure of the Quad-Output Elastic Switched Capacitor Converter supplying power to 4 cores.

where a SC converter (SCC) is dedicated per core, the current design routes power on demand by sharing the total capacitance network across all the cores and delivering power to each core in a time interleaved manner. As the current demand of a particular core increases (as indicated by the duty-cycle of the local phase-based LDO [9]), more resources are dynamically and autonomously allotted to the core. If the power demand increases further, the corresponding SCC moves to a higher output voltage by dynamically switching the conversion ratio. The proposed topology allows one core to run theoretically at a power of approximately 4P<sub>MAX</sub> while others are in standby (nearly 0 power), as opposed to a baseline design where each core can run at a maximum power of P<sub>MAX</sub> only. The proposed design enhances the state of art as, (i) demonstration of switched capacitor design where there is flexible resource sharing between more than 2 outputs (4 outputs), (ii) the design scheme is capable of providing multiple ratio to each of the outputs, and (iii) the efficient design augmented by resource sharing allows for 87% power efficiency and 2.5x core frequency range enhancement.

## II. ARCHITECTURE, DESIGN & PRINCIPLE OF OPERATION

Fig.1 shows the top-level structure of the QOESC supplying power to 4 cores. As shown each core has its own LDO. We have used phase locked digital LDO (PLDO) [9] as it provides a convenient measure of the load current via monitoring the duty-cycle of the gate pulses of power PFETs. For the SC design, we have used Extended Binary (EXB) scheme that uses two flying capacitors to produce 3 ratios with ¼ resolution [10]. EXB scheme provides a single stage solution for DC-DC power conversion, hence avoids cascading losses and does not require any intermediate output stage capacitance.

The detailed architecture of the QOESC test-chip is shown in Fig.2(a). The figure shows only a single core for ease of representation. The capacitance and switch resources have been divided into 32 identical resource slices, each forming a unit EXB SC block. QOESC design routes power based on load current demand (of individual cores). The current demand is indicated by the duty-cycle of the input pulse width modulated



Fig. 2. (a) Block level diagram for QOESC architecture (b) Decision flow chart for resource allocation in QOESC architecture.

(PWM) signals of PFETs of the local phase-based LDO. The PWM signal forms the input of a 32-bit counter that uses the same reference clock of the PLDO. The output of the counter's duty\_cycle<sub>LDO PFET IN</sub> is a digital signature for load current and is compared to preset duty-cycle thresholds using a digital comparator. If the duty-cycle is found to be high (low) then resource slices are added (removed) to the core by increasing (decreasing) the number of interleaving cycles to the design. If the duty-cycle of PLDO for a core remains higher than the higher duty-cycle threshold, even after reaching the maximum limit of resource slices, then the SCC responds by increasing the conversion ratio. A similar design is also enabled for the case when the duty-cycle is below the lower threshold. Fig. 2(b) provides further clarity by providing the decision flow for resource slice allocation in the OOESC architecture. The SCN is run at a clock frequency that provides the highest efficiency for the core with highest power consumption (typically this is also the core with highest voltage conversion ratio). This is implemented through a scan programmable VCO, whose frequency is set by the highest SCN conversion ratio. Crossdomain regulation/noise is minimized by (1) detecting voltage droops at a core with high-speed droop detector and (2) temporarily boosting the SCN clock to a transient frequency  $(F_{SW TRANSIENT} = 30MHz)$  from the steady state frequency  $(F_{SW STEADY} = 10-15MHz)$  as shown in Fig. 3. This allows the flying capacitors to quickly replenish their charge and reduce cross-domain noise propagation.

The SC network (SCN) uses the EXB scheme, mentioned before, to generate multiple step-down ratios. Fig. 3 shows the top-level block diagram for the interleaving and resource sharing schemes. The current design supports conversion ratios



Fig. 3. Detailed top-level structure of interleaving and resource sharing scheme loop control.

of 3/4, 1/2 and 1/4. Further, for each output, 32 time-interleaved phases are generated that reduce output voltage ripple at the SCN output. Although for resource sharing 16 phases are enough, using 32-phase interleaving helps in reducing the output voltage ripple. The 32-stage time interleaving is realized through 7 circular 32-bit shift registers (bank1) for each of the switch controls, as shown in the columns of switch control table in Fig.4. The resource sharing is implemented through 4 circular 32-bit shift registers (bank2) for the connection switches of each resource slice to the 4 cores as shown in Fig.1. The 32 unit SC blocks obtain their phase inputs from the shift registers in bank1 and generate different ratios in a periodic sequence. The registers in bank2 are responsible for making the correct connection between the desired ratio and the desired core. For each of the interleaving SC slices, a sequence of ratios is generated periodically based upon the inputs provided by bank1 registers and the output voltage is directed to the desired core through additional switches which are controlled by the registers in bank2. It takes 4 cycles to generate each ratio therefore 8 ratios are generated in 32 cycles.

QOESC uses Extended Binary (EXB) scheme to generate multiple step-down ratios in binary resolution [10]. Unlike conventional binary representation, EXB refers to a modified signed-digit representation with 0, 1 and -1 as its numerals. This allows for multiple representation for the same number through non-unique EXB codes. The various EXB codes of a given number  $N\varepsilon(0,1)$  with resolution of n-bits, can be translated into different sequence of SCC topologies that would finally create an output voltage such that ratio of  $V_{OUT}$  to



Fig. 4. Extended binary switched capacitor converter circuit diagram and switch control tables for ¾, ½ and ¼ ratios.



Fig. 5. Block Diagram of PLDO and the prototype core

 $V_{IN}$  is equal to N. For such a step-down SCC, the circuit would consist of a voltage source  $V_{IN}$ , n flying capacitors and output load. The scheme allows for a multi-output SCN design through an arrangement of flying capacitors and reconfiguration switches and can generate 2n-1 ratios. In the implemented design n=2 therefore we can generate  $\frac{3}{4}$ ,  $\frac{1}{2}$  and  $\frac{1}{4}$  ratios. The circuit diagram and the switch control table for generating these ratios have been provided in Fig.4.

The SCN output produces discrete output voltage levels, which are regulated via per-core LDOs (Fig. 5). The current design utilizes phase-locked LDO (PLDO) with 16 parallel phases [9]. PLDO utilizes two clocks F<sub>REF</sub>, output of reference voltage controlled oscillator (VCO) and F<sub>LOC</sub> generated from a local VCO (LVCO) that is powered by  $V_{REG}$ . Fig. 5 illustrates the circuit implementation with a divide ratio, N=1. At steadystate condition, F<sub>REF</sub> becomes equal to F<sub>LOC</sub>/N and the phase difference between F<sub>REF</sub> and F<sub>LOC</sub> locks to a constant value and turns the power PMOS 'on' for the exact duration of time that the load current demands to keep V<sub>REG</sub> constant. The phase locking occurs at each stage of the JC and the total current provided by all the PMOS devices in a time interleaved manner enables voltage regulation. For each phase, the duty-cycle of the PWM at the input of the PLDO's power PFET, indicates the current demand of the local core. A high-speed clock samples this PWM signal of the first phase to digitally represent (4-bits) the load.

The power network has 4 cores as load. Each core consists of an SRAM array, ALU, Instruction decoder and a three-stage pipeline (Fig. 5). Further, scan programmable DC load circuits and high-speed noise generation circuits are also integrated to mimic a large dynamic load range, and abrupt load steps.

#### III.MEASURED RESULTS

The design is fabricated in 130nm CMOS and occupies 2mmx2mm of area. The flying capacitance is divided into 32 equal units that are distributed evenly across the chip and implemented using dual-mimcap capacitors. In order to measure the power consumption of individual cores the supply voltage of each core has been connected to an I/O (inputoutput) pad. The 4 cores are heterogeneous in terms of their functionality, area and their load capacitance ranges from 400pF to 700pF. Chip micrograph is shown in Fig.6.

Fig.7 shows the power efficiency of the SCC at Core1 for the three ratios as a function of the output load current (all



| Process         | GF 130 nm<br>8M CMOS |  |  |
|-----------------|----------------------|--|--|
|                 | OIVI CIVICO          |  |  |
| Package         | QFN                  |  |  |
| Chip            | 2mm x 2mm            |  |  |
| dimensions      |                      |  |  |
| Flying          | 0.58 mm <sup>2</sup> |  |  |
| capacitance     |                      |  |  |
| area            |                      |  |  |
| Switch area     | 0.30 mm <sup>2</sup> |  |  |
| Controller area | 0.1 mm <sup>2</sup>  |  |  |
| Input VDD       | 1V-1.2V              |  |  |
| Output Voltage  | 0.15V-0.9V           |  |  |
| 10 (            | 0.06mA -5            |  |  |
| Load Current    | 0.0611124-0          |  |  |

Fig. 6. Chip micrograph and characteristics.



Fig. 7. Measured SC power with respect to (a) varying load current (b) varying output voltage.

the resources are allocated to Core-1). The power efficiency of the power delivery network (SCC+LDO) as function of the output voltage is measured at Core1 (Fig. 7(a)) showing peak efficiency of 87%, 81% and 67% for SCN ratios of 3/4, 1/2 and 1/4. Fig. 7(b) plots power efficiency by varying output voltage for a constant load current of 1mA. The graph demonstrates typical behavior of a multiple ratio SC design. The three peaks correspond to the three target ratios of 3/4, 1/2 and 1/4. The output voltage is measured as a function of the output load current for the proposed design and compared with a baseline design. The baseline design is created by allocating 1/4th of the SCN resources (capacitance and switch area) for each core and no adaptation is allowed. By doing this we have each core with a dedicated and fixed SCC and LDO. The data for both baseline and proposed design is obtained from the test-chip. In Fig. 8, we note more than 2X increase in output current at iso-output voltage and 64%, 50% and 43% increase in the output voltage for SCN ratios of 3/4, 1/2 and 1/4. As the load current is increased the output voltage falls from the ideal output voltage value (V<sub>0</sub>, when load current is 0 mA) due to internal resistance of the SCC. By design, the maximum drop tolerated, during test-



Fig. 8. Measured output voltage of proposed vs baseline design vs varying load current shows improvement of 43-64%.



Fig. 9. Measured power efficiency of proposed vs baseline design by varying output power shows increase of 68-90% in efficiency.



Fig. 10. Measured output voltage ripple of proposed vs baseline design for different load current shows improvement of 43-50%.



Fig. 11. Measured scope capture showing (a) boot-up of all the 4 cores using QOESC (b) demonstrating the regulation under 1mA load step.



Fig. 12. (a) Measured power vs frequency for the one of the cores showing improved operating range . (b) Measured data shows coupling on steady state cores can be reduced by transient boosting.

chip based measurement and analysis, is 1/3 of the ideal output voltage. Below this the output voltage is too low for the correct operation of the digital load. Power efficiency is measured as a function of output power for the proposed and baseline designs for all three ratios and the results are shown in Fig. 9. We note 2-2.7X increase in the output power as well as 68%-90% peak increase in power efficiency in the proposed design. Similarly, the output ripple of the SCN (which is indicative of the total SCN losses) is measured for three ratios for the proposed and the baseline designs and shows 43% to 52% reduction of ripple (Fig. 10), Fig. 11(a) shows less than 600ns of wake-up time for the SCC+LDO as the four cores are simultaneously enabled. Oscilloscope capture of a full 1 mA load step for a design testpoint where the target dropout requirement across the PLDO is of 300 mV shows droop recovery is 650ns through the dualloop SCC+LDO feedback (Fig. 11(b)).

As a result of the increased operating range from the

| Work                                 | This Work     | [2]           | [4]           | [8]           | [7]       |
|--------------------------------------|---------------|---------------|---------------|---------------|-----------|
| Technology(nm)                       | 130           | 180           | 65            | 65            | 28        |
| Topology                             | Step-down     | Step-down     | Step-up/down  | Step-down     | Step-down |
| Number of outputs                    | 4             | 3             | 2             | 2             | 2         |
| Passive                              | On-chip       | On-chip       | On/Off-chip   | On-chip       | On-chip   |
| $V_{IN}(V)$                          | 1-1.2         | 0.9-4         | 0.85-3.6      | 2.3           | 1.3-1.6   |
| V <sub>OUT</sub> (V)                 | 0.15-0.9      | 0.6,1.2,3.3   | 0.1-1.9       | 0.742-1.367   | 0.4-0.9   |
| Total Capacitance(nF)                | 4             | 3             | 1000          | NA            | 8.1       |
| Power Efficiency(η <sub>PEAK</sub> ) | 87%           | 81%           | 95.8%         | 70.9%         | 83%       |
| Max load per Output(mA)              | 6.4           | 0.033         | 1 or 10       | 12            | 100       |
| Regulation                           | LDO           | Freq-mod      | Freq-mod      | Freq-mod      | Freq-mod  |
| Multi-Ratio                          | Yes(3 ratios) | Yes(3 ratios) | Yes(6 ratios) | Yes(2 ratios) | No        |
| Fully Integrated                     | Yes           | No            | No            | Yes           | Yes       |
| Elastic SC allocation                | Yes           | No            | Partial       | Yes           | Yes       |
| Power density (uW/mm <sup>2</sup> )  | 1800          | 250           | N/A           | 550000        | 150000    |

TABLE I. COMPARISON TABLE WITH OTHER SC TOPOLOGIES.

QOESC converter, the voltage-frequency trade-off of a core shows extended range of 18% in power and 2.5x in operating frequency, thus enabling new DVFS states per core (Fig. 12(a)). To understand cross-domain noise behavior, the following set-up is used. Core4 observes a current load step of 2mA and consequently, a droop of 170 mV. Due to cross-regulation, this impacts the other cores as well. The droop measured on the other cores has been shown in Fig. 12(b) (blue bars). Use of transient boosting i.e. increasing the switching frequency of the SC power-converter during a voltage droop, reduces cross-domain noise by as much as 85% as shown in Fig. 12(b) (red bars). Table I shows competitive metrics compared to state-of-the art designs.

#### IV. CONCLUSION

A quad-output elastic SCC with per-core LDO shows peak efficiency of 87% and 2.5x increase in operating frequency range, through dynamic allocation of SC and switch resources through an all-digital FSM.

### ACKNOWLEDGMENT

This work was funded by the Semiconductor Research Corporation (Task no 1836.140), and Intel Corp.

## REFERENCES

- [1] MD Seeman et al. "The future of integrated power conversion: The switched capacitor approach". In: *IEEE COMPEL Workshop*. 2010, pp. 1430–1434.
- [2] Wanyeong Jung et al. "8.5 A 60%-efficiency 20nW-500μW tri-output fully integrated power management unit with environmental adaptation and load-proportional biasing for IoT systems". In: *Solid-State Circuits Conference (ISSCC)*, 2016 IEEE International. IEEE. 2016, pp. 154–155.
- [3] Zhe Hua and Hoi Lee. "A Reconfigurable Dual-Output Switched-Capacitor DC-DC Regulator With Sub-Harmonic Adaptive-On-Time Control for Low-Power Applications". In: *IEEE Journal of Solid-State Circuits* 50.3 (2015), pp. 724–736.
- [4] Chen Kong Teh and Atsushi Suzuki. "12.3 A 2-output step-up/step-down switched-capacitor DC-DC converter with 95.8% peak efficiency and 0.85-to-3.6 V input voltage range". In: *Solid-State Circuits Conference (ISSCC)*, 2016 IEEE International. IEEE. 2016, pp. 222–223.
- [5] Loai G Salem and Patick P Mercier. "4.6 an 85%-efficiency fully integrated 15-ratio recursive switched-capacitor dc-dc converter with 0.1-to-2.2 v output voltage range". In: *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International.* IEEE. 2014, pp. 88–89.
- [6] Dima Kilani et al. "A Dual-Output Switched Capacitor DC-DC Buck Converter Using Adaptive Time Multiplexing Technique in 65-nm CMOS". In: *IEEE Trans*actions on Circuits and Systems I: Regular Papers 99 (2018), pp. 1–10.
- [7] Junmin Jiang et al. "20.5 A dual-symmetrical-output switched-capacitor converter with dynamic power cells and minimized cross regulation for application processors in 28nm CMOS". In: *Solid-State Circuits Conference (ISSCC)*, 2017 IEEE International. IEEE. 2017, pp. 344–345.
- [8] Ivan Bukreyev et al. "Four Monolithically Integrated Switched-Capacitor DC–DC Converters With Dynamic Capacitance Sharing in 65-nm CMOS". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 65.6 (2018), pp. 2035–2047.
- [9] Samantak Gangopadhyay et al. "A 32 nm embedded, fully-digital, phase-locked low dropout regulator for fine grained power management in digital circuits". In: *IEEE Journal of Solid-State Circuits* 49.11 (2014), pp. 2684– 2693.
- [10] Alexander Kushnerov and Sam Ben-Yaakov. "Algebraic synthesis of Fibonacci switched capacitor converters". In: Microwaves, Communications, Antennas and Electronics Systems (COMCAS), 2011 IEEE International Conference on. IEEE. 2011, pp. 1–4.