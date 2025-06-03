Come for the politics, stay for the snark.

Bad coding and adverse selection

The Wall Street Journal has an excellent article on risk adjustment in Medicare Advantage. The main thrust is that insurers create/invent codes that pay at an ungodly high rate. There is one paragraph that leaped out at me as I was reading as I was thinking about ACA risk adjustment:

About 18,000 Medicare Advantage recipients had insurer-driven diagnoses of HIV, the virus that causes AIDS, but weren’t receiving treatment for the virus from doctors, between 2018 and 2021, the data showed. Each HIV diagnosis generates about $3,000 a year in added payments to insurers.
Everyone with HIV should be on antiretroviral drugs, the only effective treatment, and nearly all Medicare patients whose doctors diagnosed the virus took the drugs. Less than 17% of patients with insurer-driven HIV diagnoses were on them, the Journal found.

Medicare Advantage risk adjustment is “widget” risk adjustment where the insurer gets paid from the federal government for each new diagnosis code generated while ACA risk adjustment is zero-sum where one insurer that codes more aggressively gets paid (or reduces their payments) from other insurers that don’t code as hard. However, both systems derive the value of a diagnosis code group in a similar way. CMS runs a massive set of regressions where total costs are a function of patient demographics and plan type plus a set of binary indicators for disease groups and their interactions where the coefficient for the disease group is the incremental total cost for people with that disease condition, holding everything else constant. In this case, HIV has an incremental extra cost of “about $3000” for Medicare.

That “about $3000” is a combination of people who got diagnosed by their doctors and are getting treatments that cost money AND some people with “insurer-driven” diagnoses that are getting treated with the appropriate drugs AND a lot of people with “insurer-driven” diagnoses that are not getting the appropriate drug and whose actual costs due to HIV are zeros. The lack of drugs is likely some combination of care management failures as people aren’t connected to their docs and people who legitimately don’t need these drugs.

The about $3000 risk adjustment value is a weighted average of doc diagnosis and treated patients and their associated costs and insurer driven diagnoses and their much lighter (on average) treatment costs. This is really problematic in the ACA if we assume that distribution of “insurer-driven” diagnoses for HIV (and likely other diseases) are not random nor uniform. If we think that these insurer driven diagnoses are concentrated in low risk insurers, the zero sum nature of ACA risk adjustment is impacted in two ways. In the first year, it increases the reported risk score of low risk insurers relative to high risk insurers. This reduces the amount of money that flows to high risk insurers that are paying for treatment of patients who were diagnosed by their doctors. Over a multi-year period, the “value” of HIV is depressed as there are a lot of zero incremental cost patients being added to the pool.

Systemically, if this scenario is occurring in the ACA, it makes the business case for insurers that attract and pay-for actual high cost and high risk enrollees far harder. If this is happening, the logical response from high risk insurers is to find ways to either code ever more intensively/creatively OR to tilt their product offerings to be less attractive to actual high cost patients such as narrowing networks, or adding extra targeted cost-sharing or increasing prior-authorization requirements.

    1. 1.

      RepubAnon

      Insurer-driven coding sure does sound like some type of fraud. But it’s not related to Joe Biden, so Pam  Bondi won’t be interested.

    3. 3.

      Steve in the ATL

      I have yet to achieve my dream of being smart enough to post a substantive comment on one these threads!

    4. 4.

      p.a.

      My retiree plan is a MA PPO but the union/Verizon contract has for the most part kept it pretty subscriber-friendly so far.  It’s United Health🤢 and they have tried the “visiting nurse” scam that is just an attempt to generate money-making codes.

    5. 5.

      Another Scott

      I believe it’s dsquared (Dan Davies) whose mantra is “a system is what it does”.  IOW, it optimizes itself based on the rewards present.

      Good teachers warn about the dangers of “teaching to the test”.

      Similarly, TheRegister:

      The technique could improve other types of agents, not just those that write code.

      “The beauty of this framework, powered by code and open-ended exploration, lies in its generality,” said Zhang. “If progress can be measured and the medium is code, the Darwin Gödel Machine can optimize for any benchmark. Whether it is coding ability, energy efficiency, or another domain, the system can adapt by using that metric to guide its own self-improvement.”

      That improvement has limitations, however. Zhang said, “For example, we have only demonstrated the DGM in the domain of code. While code is a highly general and expressive medium, some tasks or benchmarks may depend on modalities beyond what code alone can represent.”

      What’s more, fixed benchmarks themselves can become a problem. During an attempt to reduce hallucinations – incorrect or mispredicted output – in an underlying model, DGM was observed cheating.

      […]

      “In our behind the scenes experiments on solving hallucination, we observed several instances of the DGM ‘cheating,’ modifying its workflows to bypass the hallucination detection function instead of solving the underlying issue,” Zhang explained. “This is a broader concern, not just for the DGM, but also for AI development in general.”

      Pointing to Goodhart’s law, which posits, “when a measure becomes a target, it ceases to be a good measure,” Zhang said, “We see this happening all the time in AI systems: they may perform well on a benchmark but fail to acquire the underlying skills necessary to generalize to similar tasks.”

      The paper describes how the researchers created a reward function and tried to use DGM to optimize the software agents it generates to minimize hallucination coming from the underlying model.

      “To detect hallucination in the logs, we insert special tokens when actual tool use occurs and then check if these tokens appear in the model’s plain-text output, indicating that the model mimicked tool use without actually invoking any tools,” the paper says. “Importantly, the hallucination checking functions are hidden from the coding agent during self-modification.”

      What they found is that while DGM often took steps that reduced hallucination, it also sometimes engaged in objective hacking.

      “It scored highly according to our predefined evaluation functions, but it did not actually solve the underlying problem of tool use hallucination,” the paper explains. “…The agent removed the logging of special tokens that indicate tool usage (despite instructions not to change the special tokens), effectively bypassing our hallucination detection function.”

      Inconceivable!!11

      :-/

      Thanks for keeping an eye on this stuff. It’s important.

      Best wishes,
      Scott.

