To understand what makes Toshiba / Kioxia’s BiCS TLC NAND different from the typical “ONFI” NAND a bit of background information is required. The foundation of all Not-AND (aka ‘NAND’) Solid State storage is a tiny little electrical gate, and a tiny little electrical ‘battery’ cell called a transistor. To change the stored data bit from a 0 to 1, or vice versa, the controller tells the gate to either let the charge ‘out’ (aka ‘discharge the cell’) or store some in the cell (aka “charge the cell”). This is called a Program Operation as it is literally setting the charge state of cell or ‘programming’ it.
To read the NAND cell the controller simply pushes a tiny trickle of power across the gate and reads that cell’s charge (via resistance). For example, in SLC or single level cell NAND the cell has two states: full and empty. Ironically, since the controller is reading a difference in voltage across the cell the default ‘empty’ state is usually considered 1 and not a 0. So, an ‘empty’ or discharged state is the same as storing a 1 in that cell, and fully charged is a 0 bit. For old timers, this is why in AS Cleaner or ‘Tony Trim’ type programs (before TRIM was a sure thing) you had to use the ‘FF’ write option… as ‘FF’ is the same as emptying all the cells via “1” being written to each and every cells.
Since a NAND transistor is capable of holding a charge even when the system is powered off NAND storage is considered non-volatile (unlike RAM which once the power is turned off the data is flushed). However, just like ‘bit rot’ on a Hard Disk Drive this state is not permanent. Instead, and just like the battery in your turned off phone, if the SSD is left long enough without being powered on the power in that transistor will leak out… and turn a 0 into a 1. Needless to say this is the epitome of “A Bad Thing™” data reliability point of view.
With SLC NAND, a cell that is left alone will not lose all that much voltage and is fairly stable for months. However, as each NAND cell can only store 1 bit it is rather expensive to create say a 512GB/8Tbit SLC NAND IC… and scaling up beyond a mere Terabyte of total storage requires a lot of NAND chips. Even layering NAND dies on top of NAND dies on top of NAND dies the resulting foot-print would be rather large to say the least. This combination is why Multi-Level Cell NAND was created and doubled a NAND cell’s storage capacity from 1 bit to 2. It did this by adding more voltage resistance points between ‘full’ and ‘empty’. So instead of two states there was now four states: 0, .25, .5, .75, and 1. This certainly increased storage density, and decreased cost of manufacture to hit a given storage capacity, but at the same time made the self-discharging of NAND a greater issue – as lot less of a voltage change can indeed impact two bits of data. Thus MLC is not considered as stable as SLC and has to be refreshed / rewritten more often (aka ‘internal housecleaning’).
As time went by MLC was further supplanted by TLC (Tri-Level Cell) NAND which can store three bits of data per cell. In TLC NAND there are not two, not four, but eight voltage states (and for comparison sake to hit 4 bits or QLC NAND requires sixteen voltage points). This in turn increases the necessary precession in properly writing and reading a given state of a NAND cell. It also significantly decreases how long a TLC NAND cell will hold this more precise voltage state. This increased precision plays a large role in why TLC NAND is ‘slower’ at write operations than MLC (which is in turn slower than SLC).
We mention all this because back in 2007 Toshiba’s storage division understood this underlying issue and took steps to ensure that their NAND would hold a charge longer, not become damaged as quickly as ONFI designs, and could even be written to faster than ONFI NAND designs. In typical NAND designs the gate is a ‘floating’ gate (FG or FTG for short) and is the conductor of electricity. So, in order for the drive’s controller to change a NAND cell’s state it pushes electricity through the gate itself. Over time this current running through the floating gate causes miniscule fractures in the crystal lattice that makes up its foundation. On their own a single fracture will not cause voltage related issues, but over time the culminative effective does create a significant problem. When this happens, the end result is a gate which is incapable of keeping voltage levels inside the transistor stable. This is what a ‘dead’ NAND cell really is – a NAND cell that can no longer hold a precise charge.
Worse still, with every reduction in fabrication size the cell itself not only becomes smaller but the gate itself also becomes smaller. A smaller gate means fewer defects are required before the gate fails. Also, the more precise the voltage requirements, the faster accumulative defects become an issue. By creating ‘3D’ NAND designs NAND manufactures have been able to create larger cells with larger gates, but the underlying issue of using a classic FG design remains.
In laymen’s terms, a floating gate literally ‘eats itself’ every time it has to change the state of its transistor. The smaller the gate, or the more precision required, the faster it finishes its meal… and the faster the cells die. NAND manufactures are well aware of this issue and include free NAND cells to replace ‘dead’ cells. They are also fully aware that this is only a band-aid solution as eventually there will be no more replacement cells left.
Controller and storage companies are also aware of this issue and include even more ‘replacement cells’ (aka ‘over-provisioning’ and why your 1TB drive may be seen as a 960GB drive) but also use slower / gentler write algorithms to reduce this damage. In simplistic terms this means they write to the NAND using less voltage but over a longer period of time. This does reduce cell damage, but at the cost of write performance. To overcome this side effect, the controller will pretend a given chunk of TLC NAND is SLC NAND and only store two charge states instead of eight (or 16 in the case of QLC) per cell. This allows those cells to continue being used long after the amount of damage to the gate would result in random bit changes if eight (or more) states were needed. This “pseudo-SLC” is then used for real time writes as a buffer and is called different things by different manufactures. For example, Crucial calls it “Dynamic Write Acceleration”.
So with all that background information to digest out of the way, let’s start by saying that Bit Cost Scaling (“BiCS”) NAND is not based upon a float gate design. Instead it is a Charge Trap gate design (aka CT or CTF). In CTF based NAND, the gate is not a conductor, instead it is the insulator. So instead of being responsible for getting power in to the gate, it is responsible for keeping it there. This means during write or erase operations less stress is being placed on the gate itself. The end result is those deadly fractures are less likely to occur, and when they do occasionally happen are usually electron level in size. Since they are smaller, they can only drain off the electrons that are ‘touching’ the fracture. Leaving the rest of the electrons in the cell to control the voltage adequately enough that the controller can still accurately tell what the cell state is.
While yes eventually fractures in the gate’s lattice matrix will multiply to the point that they too also cause the cell to be marked as ‘dead’, the likelihood and severity means a CTF based NAND transistor will last noticeably longer than float trap design. Equally important, this in turn not only increases longevity of the NAND cell but also allows for faster – and yet ironically enough still gentler- programming cycles to be implemented on the NAND. Which in turn increases sustained write performance. Charge Trap gates also gain longevity and performance from the switch from planar (aka 2D) NAND designs to 3D designs. Arguably CT/CTF NAND gains even more durability as they do not ‘eat themselves’ nearly as fast as FG NAND – so the increased mass means the NAND cells last even longer. When paired with pseudo-SLC algorithms the end result is faster performance, increased durability, and generally a better end-user experience.
Due to the unique gate design of the cell in BiCS NAND, the layout of the cells in the NAND are also different. In typical floating gate-based ‘3D’ NAND cells, the cells are laid out much like a large apartment building with X floors per ‘building’. So for 64-Layer ‘3D’ NAND you would have a ‘building’ that was 64 floors tall and so many ‘apartments’ per floor of the ‘building’. Further extending the analogy, the ‘hallways’ and ‘elevators’ are the control pathways upon which data is transferred from the NAND cells to the controller and back again (i.e. are the interconnects the controller uses).
As the interconnects are optimized for a small foot-print (so more ‘apartments’ can be fit in the ‘building’), when the controller needs to erase the data to make the cells ready for future writes they have to move the data in smaller chunks lest the interconnect bus become saturated. This is basically why floating gate NAND can usually only erase one page’s worth of cell blocks at a time. To imagine why this is the case, think of the typical mega-apartment building. Now imagine every person on a given floor with all their furniture/cloths/etc. No imagine them all suddenly trying to cram into the hallway all at once. Then imagine only having a few apartments being emptied and moved at a time. It becomes obvious why FG based designs purposely slow things down during their erase cycle.
With Toshiba’s BiCS design, the NAND is not laid out like a typical ‘apartment building’ with vertical and horizontal inter-connect pathways. Instead, BiCS uses a multi-plane ‘folded’ design with 2 (BiCS 3 and 4) or 4 (BiCS 5) planes per ‘building’… or put another way it takes a long ‘double’ or ‘quad’ length stack of NAND cells and folds it in half and then half again. BiCS 6 keeps to 4 planes but then doubles the decks to 2. Meaning that instead of a design reminiscent of something out of the movie Inception it actually kinda-sorta looks like an accordion stacked on top of other accordions… as envisioned by Sali’ Dali.
This different layout means each layer of NAND has a wider ‘hallway’ and more ‘elevators’ to use during a “move”. This is why Toshiba/Kioxia BiCS can erase (or in our analogy move families in the apartments) not one but multiple pages worth of cells at one time. So, while both types of NAND are still erased at the block level… it will take a lot less time for the erase cycle to complete with BiCS based NAND. This means less cycles wasted on housecleaning, more cycles for real time I/O requests, and less chances of the end user noticing ‘slow downs’ when the pseudo-SLC NAND blocks are full and need to be erased before this ‘buffer’ can once again boost performance.
On top of these significant differences, there is also one more major difference between Charged Trap and Floating Gate NAND designs. This is in the likelihood of capacitive coupling occurring. Put simply, since CTF cells are less likely to become damaged, and since the damage is much smaller, capacitive coupling is also lot less likely to happen. Capacitive coupling occurs when these defects in the gate do not drain the cell voltage enough for the controller to notice, but enough that these free electrons can bleed over to adjacent cells. When enough of this bleed over occurs, this extra electricity can jump into a new cell and the bits stored in the failing cell not only randomly change, the data in the cells around them also randomly change.
Even more concerning is if this bleed over happens over a long enough period the cells surrounding them not only have their voltage state silently changed but become linked – or ‘fused’ together. Linked via a pathway that is not monitored by the controller. When this happens, the controller can not only be writing to the cell it thinks it is writing to but actually writing to multiple NAND cells at the same time. Needless to say, when these linked cells are read the low-level Error Correction Code immediately kicks in and reduces read performance as it tries to recover the corrupted data.
When all this explained to people the most common question asked is ‘why is Charge Trap not used for all NAND?!’. The answer is simple. It costs more to produce, but after all these years most now do. Yes, It took way, way too long but out of the top five manufactures of NAND, all use a variation of it… as the rest who did not adapt (and were not mega-corps) no longer matter or even exist. For example Samsung’s ‘V-NAND’ is CT based, Toshiba/Kioxia’s BiCS is CT based, SanDisk’s BiCS is (obviously) CT based, SK Hynix is ‘CTF’… even Micron and their ‘Replacement Gate’ is basically CT (just without the name). As for Intel, Intel is no longer really in the NAND business beyond theoretical research like 5bits per cell NAND or ‘ideas’ of somehow making floating gate viable. As such it is blatantly obvious that Grand Daddy Tosh may no longer be with us, but their legacy (via Kioxia) lives on as the right answer to NAND design.