Kirby-Desai Scale: How It Scores, and Why It Runs High

If a clinician just handed you a Kirby-Desai score and quoted you a session count to match, this is how to read that number. The scale is the closest thing laser tattoo removal has to a standardized session-count predictor. It was published in 2009, it is still named at most consultations, and the calibration research that has come out in 2024 and 2025 is consistent on one thing: the scale tends to over-predict against modern devices. All three of those things are true at once, and holding them together is the useful way to read your score.

Most secondary coverage of the scale stops at the scoring table. A summary reproduces the six factors, adds a short gloss, and presents the sum as a current-generation prediction. That is useful for understanding what the factors are. It is less useful as a planning tool for 2026, because the devices the scale was calibrated against in 2009 are not the devices a clinic is likely to own today, and the calibration literature of the last two years now speaks to that gap directly.

What follows walks the six factors, what each one is measuring, and why it matters. Then what the original 2009 study actually showed, what the 2024-25 follow-up studies have found, and where the scale falls silent. Finally, how to hold the number you got: as a vocabulary for the consultation conversation, not a self-diagnostic. You can run your own score in the calculator when you want to see how your specific tattoo lands.

What the scale is, and where it came from

William Kirby, Alpesh Desai, and colleagues published the scale in the Journal of Clinical and Aesthetic Dermatology in 2009. It was built from a retrospective chart review of 100 patients treated at a single practice with Q-switched Nd:YAG and alexandrite lasers, the two dominant nanosecond-pulse devices for tattoo removal at the time. (“Q-switched” describes a pulse mode that releases energy in a few billionths of a second so the laser shatters ink without burning surrounding tissue.) The authors proposed the scale as “a practical tool to assess the number of laser tattoo-removal sessions required, which will translate into a more certain cost calculation for the patient.” A tool for budgeting, in other words, not a binding clinical prediction.

The method is additive. Six factors, each scored on a small integer scale, summed into a single number. That number is the authors’ predicted total sessions for the patient whose tattoo was scored. In their original 100-patient cohort, the predicted scores correlated with the observed sessions at r=0.757, p<0.001, with a mean observed treatment count of 9.91 sessions (standard deviation 3.18, range 3 to 20).

A later review by Ho and Goh in 2015 called the scale “a useful aid during patient counselling.” That third-party framing is the one to anchor on. The scoring table is a durable pedagogical tool. It names the variables that drive session-count variance in a form a reader can see at a glance. The specific number it returns is a different question, and one the modern literature now speaks to directly.

The six factors, scored

The full scoring table, from the 2009 paper:

Factor	Point values
Fitzpatrick skin type	I=1, II=2, III=3, IV=4, V=5, VI=6
Anatomical location	Head/neck=1, Upper trunk=2, Lower trunk=3, Proximal extremity=4, Distal extremity=5
Pigment color	Black only=1, Mostly black with some red=2, Mostly black and red with other colors=3, Multiple colors=4
Amount of ink	Amateur=1, Minimal=2, Moderate=3, Significant=4
Scarring or tissue change	None=0, Minimal=1, Moderate=3, Significant=5
Layering	No=0, Yes (cover-up)=2

Each factor is tracking something real about the physics and the biology of how laser tattoo removal works. None of them are arbitrary.

Fitzpatrick skin type

Fitzpatrick type (the standard six-point dermatology scale where higher numbers indicate more melanin in the skin) matters because melanin absorbs the same wavelengths the laser uses on ink. Darker skin holds more melanin in the epidermis, which means the laser has to be turned down to avoid burning the surface layer. Lower fluence (the laser’s energy per unit area of skin) clears ink more slowly. Higher Fitzpatrick also raises the risk of post-inflammatory hyperpigmentation (darkening of the skin after the laser settles) and hypopigmentation (patches of lost pigment), both of which push clinicians to space sessions further apart. More sessions, wider apart, for higher Fitzpatrick types. The point scaling (1 through 6) tracks this almost linearly.

Anatomical location

The body’s blood and lymphatic supply does the clearance work after the laser fragments the ink. Distal sites (feet, hands, lower legs, forearms) have less circulation and slower lymphatic return than central sites. Fragmented ink sits longer at distal sites. Clearance between sessions is slower, and the total count stretches. Lower-score sites (head, neck) carry shorter predicted session counts on the scale; higher-score sites (distal extremities) carry longer ones. The 1-to-5 scoring reflects that gradient.

Pigment color

Different inks absorb different wavelengths. Black absorbs broadly across the visible and near-infrared range and clears well at 1064 nm, the single most common wavelength for tattoo removal. Red absorbs around 532 nm. Green and light blue absorb at 694 nm or 755 nm. Yellow, white, and fluorescent pigments resist most wavelengths, and some can paradoxically darken on the first pulse because of oxidation in the pigment itself. More colors in the tattoo means more wavelengths needed to address it, which usually means sequential passes with different devices, or a clinic with a multi-wavelength system. The scoring (1 through 4) compresses that into a rough proxy.

Amount of ink

Professional tattoos deposit more pigment per square centimeter than amateur tattoos, and deposit it deeper and more uniformly in the dermis (the skin layer below the surface). More ink means more laser pulses to fragment all the particles. The scale’s four-category treatment is a coarse proxy for what is a continuous variable. It is one of the places the scale gives up real precision in exchange for usability, a deliberate trade-off present in almost every clinical scoring tool.

Scarring or tissue change

Scar tissue from the tattoo process itself, from a previous removal attempt, or from unrelated trauma over the tattoo changes two things at once. It reduces the depth to which the laser penetrates, because scarred skin scatters light differently than healthy skin. It also disrupts the macrophage-and-lymphatic clearance pathway the body relies on to carry fragmented ink away after each session. Both effects push session counts up. Minimal scarring adds one point; moderate adds three; significant adds five.

Layering

Cover-up tattoos stack one tattoo on top of another to camouflage the original design. The total ink load in a cover-up is typically two to three times that of a single tattoo, because the cover-up ink has to be dense enough to hide the original. Layering is scored as a binary: no is zero, yes is two. It is the smallest additive factor, but it is the one that most consistently surprises patients who did not realize their cover-up was stacked over a previous piece.

What the total means

The minimum possible score is around 3. Summing the highest values across all six factors yields a theoretical maximum near 26 (Fitzpatrick VI=6, distal extremity=5, multiple colors=4, significant ink=4, significant scarring=5, layering=2), reached only in the most extreme cases. Most real-world tattoos fall somewhere in the 6 to 15 range. The original authors flagged scores above 15 as cases that “may be difficult to remove and should be assessed by the physician to decide whether laser removal is the method of choice for the patient.” That threshold is the scale’s own handoff point from self-scoring back to consultation. Don’t skip past it.

What the original study actually showed

The 2009 paper is a retrospective chart review. The authors went back through the records of 100 patients already treated at their practice, scored each tattoo after the fact, and then checked the scores against the observed number of sessions each patient had received. It is not a prospective validation on a separate cohort. It is not a randomized trial. It is a correlation study on a single practice’s records.

The headline numbers from that cohort: a correlation coefficient of 0.757 between predicted and observed sessions, significant at p<0.001 (meaning the correlation is unlikely to be random chance), with a mean observed treatment count of 9.91 sessions, a standard deviation of 3.18, and a range from 3 to 20 sessions. The authors themselves acknowledged the cohort had “a completely successful tattoo-removal rate” and “may not include sufficient difficult-to-treat tattoos (scarring, layering tattoos),” which is a selection bias toward easier cases that limits how the scale’s upper end was tested.

A figure that circulates in secondary coverage of the scale is that it is “80% precise.” That phrasing does not appear in the original paper. What the paper reports is a Pearson correlation of 0.757, a measure of how tightly two variables move together, not a per-patient precision rate. Squaring that correlation gives a coefficient of determination of about 0.57, which is the share of the variance in observed sessions the scale explains in that original cohort. The 2025 Menozzi-Smarrito and Pineau head-to-head puts the per-patient prediction error of Kirby-Desai at root-mean-square error (RMSE) of 3.7 sessions on their twelve-case validation set, which is the cleanest modern figure on the original scale’s per-patient precision; their newer model lowered that to 2.2 sessions in the same comparison (the calculator methodology article walks through what that means for the displayed range). If you encounter the “80% precise” framing, it does not come from the scale’s authors.

Two distinct things are being described here, and holding them apart prevents a common misreading. Discrimination (whether the scale correctly ranks tattoos from easier to harder to remove) is what r=0.757 measures, and the scale does this reasonably well. Calibration (whether the absolute numbers it produces match the true session counts) is what the 2024-25 papers are testing, and against modern devices the calibration is the weak link. A scale can rank patients accurately while its absolute predictions are systematically high. Those are different properties of the same tool.

What has changed since 2009

Two things have changed since the scale was published: the devices, and the technique.

On devices: picosecond lasers, machines that fire pulses on the order of 300 to 750 picoseconds (an order of magnitude faster than the typical 6-to-10-nanosecond Q-switched pulses the scale was calibrated against), became commercially available in 2013, when the FDA cleared the first picosecond platform (PicoSure), and are now standard at well-equipped clinics. The clinical literature supports picosecond lasers clearing compatible inks (primarily black and dark blue) in fewer sessions than nanosecond devices on many patients, though the magnitude of that reduction varies widely by study, by population, and by ink.

On technique: combination approaches have become more common. Multi-wavelength devices let clinicians address multiple ink colors in the same session. The R-20 method (multiple passes per session) and the R0 method (perfluorodecalin patches between passes to speed skin recovery) can compress sessions in some hands. None of these were in the 2009 cohort.

The calibration studies of the last two years are also, in a methodological sense, the external validation the scale has needed: independent cohorts, independent clinicians, tested against the original scale’s predictions outside the practice that built it. Three of them are the ones to know.

Egozi and colleagues published a retrospective in 2024 comparing Kirby-Desai predictions against actual treatment counts in 11 patients treated with a short-pulsed Q-switched dual-wavelength Nd:YAG device (per the paper, a four-wavelength platform of which two wavelengths were used in this study). The average actual number of treatments was 5.09. The average KD prediction was 9.9. The difference was significant at p<0.001, and the authors called this “a new perspective” on the scale. With 11 patients, the confidence interval around that gap is wide; the direction of over-prediction is the finding here, not the precise magnitude.

Menozzi-Smarrito and Pineau published a new predictive model in 2025 built on 116 patients treated with a PicoSure 755 nm picosecond laser, Fitzpatrick I-IV. They then validated their model head-to-head against Kirby-Desai on 12 cases. Their model produced an RMSE of 2.2 sessions; the Kirby-Desai scale on the same cases produced an RMSE of 3.7 sessions, a 41% reduction in prediction error, not in session count. The study did not include Fitzpatrick V-VI patients, so the improvement should not be extended to darker skin types. What the study does say clearly is that the modern picosecond baseline admits a tighter model than the 2009 scale, at least in the Fitzpatrick I-IV range the authors tested. That is not just a critique of the original scale; it is the derivation and internal validation of a candidate successor model.

Aurangabadkar and colleagues published a prospective study in 2019 often cited as evidence of KD over-prediction in darker skin, and it deserves a careful read. Their cohort was 22 patients (Fitzpatrick IV: 9 patients, 40.9%; V: 12, 54.5%; VI: 1). Every patient had an amateur, single-color (black) tattoo. Treatment used a Q-switched Nd:YAG 1064 nm device with the R0 technique (a liquid called perfluorodecalin applied between passes to speed skin recovery), which is not routine at US clinics. The figures should be read as a ceiling on what an amateur black tattoo on darker skin can compress with specialized technique, not a typical course for any other tattoo type. On that basis: KD predicted 7 to 14 sessions with a mean of 9.7, and actual sessions ranged from 1 to 4, with 68% of patients completing in 1 to 2 sessions. The direction of the finding (KD over-predicts) holds; the magnitude in this study should not be read as typical modern practice for professional or multi-color tattoos.

Taken together, the three studies point the same direction. The scale tends to over-predict session counts against modern devices and technique. The exact size of that gap varies by study, device, and population, and none of the three was designed to represent the full range of real-world patients. Egozi’s 11 cases ran on one device; Menozzi-Smarrito’s Fitzpatrick I-IV cohort ran on a single picosecond platform; Aurangabadkar’s cohort had amateur black tattoos with a non-routine technique. The over-prediction direction is consistent; no published study has converged on a single correction factor. The reading that holds: modern practice tends to clear faster than the 2009 scale predicts, and the clinician you sit with will tighten the estimate in person.

This is a well-documented pattern in clinical scoring tools. The Framingham cardiovascular risk score, derived in the 1950s, required formal recalibrations in 1998 and 2008 as statin use and population demographics shifted under it. A scale calibrated at one clinical moment becomes progressively miscalibrated as the clinical landscape moves. No published revision to the Kirby-Desai scoring table has appeared since 2009, and the table on clinic pages in 2026 is the same one published then.

What the scale does not capture

The calibration literature above is about how the scale’s number runs high against modern devices. Even a perfectly calibrated 2009-era tool would still miss what follows: the factors the scale does not score at all.

Six factors is fewer than the number of things that drive session count. The gaps are known.

Tattoo age. Older tattoos (roughly 10 years or older) tend to clear faster than recent ones, because the body has already partially cleared superficial ink through ongoing macrophage trafficking over the years since the tattoo was applied. The Menozzi-Smarrito and Pineau 2025 model identified tattoo age as a predictor; the Kirby-Desai scale does not score it. (If your tattoo is more than a decade old, that is a small piece of good news the scale will not give you credit for.)

Ink chemistry beyond color categories. “Multiple colors” is treated as a single category. The scale does not distinguish between a tattoo containing titanium-dioxide white (which can paradoxically darken from oxidation on the first pulse) and a tattoo that is mostly black with a small red accent. Modern ink chemistry is more varied than the 2009-era palette the scale was built against, and some newer pigments behave in ways the scale does not predict.

Immune and systemic factors. Smoking, age, diabetes, and immune suppression all affect the macrophage-based clearance pathway the laser relies on. A Fitzpatrick III non-smoker in her thirties will clear fragmented ink faster than a Fitzpatrick III chronic smoker in her sixties, all else equal. None of this is in the scoring table.

Operator and device variables. The site’s editorial position, consistent with the clinical literature’s emphasis on operator variables, is that operator training, wavelength selection, and fluence calibration drive more variance in outcomes than device brand alone. The scale implicitly assumes a competent operator using an appropriate wavelength for the ink. That assumption holds for the purpose of the scale itself; if you are planning a real budget, treat the operator as a variable you should ask about directly.

Ink density granularity. The Menozzi-Smarrito and Pineau 2025 analysis found that ink density had the most significant impact on laser tattoo removal in their cohort, with finer granularity than the scale’s four-category amateur-to-significant scheme. A dense blackwork sleeve and a minimal line-work piece can both score the same on amount of ink. They do not behave the same way under the laser. Tattoo age and refined ink density are precisely the kinds of variables a next-generation scale could absorb.

There is also a gap in the published evidence, which is not a gap in the scale itself but is worth naming. No study has validated the Kirby-Desai scale specifically on a picosecond-treated Fitzpatrick V-VI cohort. The Aurangabadkar 2019 study is the closest, but it used nanosecond Q-switched plus R0 technique on amateur tattoos. The modern picosecond evidence base for darker skin is thin across the board, and the modern picosecond evidence for how the scale calibrates in darker skin is essentially absent. If your skin type is Fitzpatrick IV-VI, in-person consultation with a board-certified dermatologist or other appropriately licensed clinician (rather than a med-spa with an unlicensed tech) carries extra weight precisely because the published guidance for your skin is thin.

How to actually use the number

Clinical methodologists distinguish between a scale’s accuracy and its utility. The more useful question to ask of the Kirby-Desai scale is not “are the predictions right” but “does using it improve the consultation conversation compared to not having it.” A scale that over-predicts by two sessions but gives you and the clinician a shared vocabulary for a structured conversation can still be net positive. That is the bar Kirby-Desai clears in 2026: the framework is useful, the absolute numbers should be held loosely.

A low total (in the single digits) means your tattoo sits in a cohort where modern devices and technique have the most room to compress the count below the 2009 cohort’s experience. It does not mean you will need exactly that many sessions. It means the clinician, seeing the tattoo in person, will refine the estimate against your specific tattoo and the device they use. The 2024-25 calibration literature is clearest at this end of the range.

A mid-range total (roughly 10 to 15) is the zone where the scale is most useful as a conversation tool, and also the zone where the gap between its prediction and modern observed counts tends to be widest. Two questions are worth bringing into the consultation: what device the clinician will use, and how they calibrate their session-count expectations against the scale. The answers vary. Some clinicians apply the 2009 numbers directly. Others explain how they adjust for picosecond devices and modern technique. Still others use the scale as one reference among several. Either way, you learn where the estimate is coming from. A clinician who quotes back exactly the scale’s number with no adjustment language is working from the 2009 baseline; a clinician who walks you through their device-and-technique adjustments is working from the current literature. Both answers tell you something useful. A clinician who cannot or will not explain their method is the answer that should prompt a follow-up question.

A total above 15 triggers the original authors’ own recommendation: physician assessment to decide whether laser removal is the right method for you. Not a self-diagnosis as a “hard case,” not a signal to give up, and not a ceiling. A handoff back to the in-person consultation the scale was always meant to prepare you for.

The American Academy of Dermatology’s patient-facing page on laser tattoo removal takes a similar line. The AAD names the variance factors (tattoo age, ink penetration, colors, location, health, medications, scar history) and does not quote a predicted session count. That is the model. Name the factors, name the variance, and leave the specific number to the clinician who has seen the tattoo.

What the scale gets right, and what it doesn’t

The Kirby-Desai scale has two pieces. One is still useful. The other has aged.

The factor list (Fitzpatrick type, location, color, amount, scarring, layering) is a durable enumeration of the variables that drive how many sessions a given tattoo is likely to need. That list is why the scale still gets named at consultations and why the scoring table still appears in patient-facing summaries. It is a shared vocabulary for a conversation that would otherwise have no structure.

The specific predicted number is a 2009 estimate from a retrospective chart review of 100 patients treated with Q-switched nanosecond devices. Modern devices and technique compress session counts below that estimate in each of the calibration studies reviewed here, though by different magnitudes and with no settled correction factor. A clinician who has seen your tattoo will narrow the estimate; if you are working from the scale alone, hold the KD number as the high end of your planning range, not the expected outcome.

Whichever total the scale returns, the next move is the consultation. Listen for whether the clinician works with the scale or against it, and how they explain the difference. That is the useful work the scale can still do in 2026. The session-count range it produces is one half of the budgeting question; our cost explainer walks the per-session rates and total-cost math that pairs with it, and the calculator takes the same six inputs and returns the modern picosecond-corrected range.

Sources

Menozzi-Smarrito and Pineau (2025) (pmc.ncbi.nlm.nih.gov)
Kirby et al. (2009) (pmc.ncbi.nlm.nih.gov)
Ho and Goh (2015) (pmc.ncbi.nlm.nih.gov)
Aurangabadkar et al. (2019) (pmc.ncbi.nlm.nih.gov)
Egozi and Toledano (2024) (pubmed.ncbi.nlm.nih.gov)
American Academy of Dermatology: laser tattoo removal patient guidance (www.aad.org)

Full bibliography →