Skip to content
0
  • Home
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
  • Home
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Sketchy)
  • No Skin
Collapse

Wandering Adventure Party

  1. Home
  2. Uncategorized
  3. If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

Scheduled Pinned Locked Moved Uncategorized
llm
37 Posts 31 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Malstrøm :damnified:🧉M Malstrøm :damnified:🧉

    @xrisk @pseudonym Volume is a key factor here. But even if the volume was the same, LLMs are doomed to stagnate as devs—whose code was scraped for training data—are displaced.

    RishavX This user is from outside of this forum
    RishavX This user is from outside of this forum
    Rishav
    wrote last edited by
    #17

    @malstrom @pseudonym that’s an interesting claim. I don’t know enough about LLM research to make a judgement. I do know that LLMs trained on synthetic (other LLM-generated) data tend to perform worse, but have we reached the limits of what LLMs are capable of? In my limited understanding, if an LLM can “learn” fundamental programming “concepts” (the same way they can “learn” concepts across human languages — I could be wrong in my understanding here), they should (might?) be able to transfer/apply those concepts to not-before-seen domains (maybe with a bit of “reasoning” prodded in).

    Krzysztof SakrejdaW 1 Reply Last reply
    0
    • Pseudo NymP Pseudo Nym

      If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

      That's a cognitively brutal task.

      Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

      I propose any productivity gains will be consumed by false negative review failures.

      MoutmoutM This user is from outside of this forum
      MoutmoutM This user is from outside of this forum
      Moutmout
      wrote last edited by
      #18

      @pseudonym This.

      I do a lot of "computer science labs", where students learn to write code, and they wave me down when they have questions. When their code doesn't do what they expect, it's often easy to figure out what went wrong because you can spot a bit of code that looks funky. And usually, the problem is in those few lines.

      LLM code is meant to look like good code, so you don't get these little shortcuts.

      1 Reply Last reply
      0
      • Pseudo NymP Pseudo Nym

        If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

        That's a cognitively brutal task.

        Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

        I propose any productivity gains will be consumed by false negative review failures.

        toldtheworldT This user is from outside of this forum
        toldtheworldT This user is from outside of this forum
        toldtheworld
        wrote last edited by
        #19

        @pseudonym I have posed this conundrum before and the answer I received is that there is also an opportunity cost to not moving faster and the risk of a catastrophic bug may not outweigh the risk of being overtaken by competitors, especially since that was already happening before LLMs anyway.

        Also, it *seems* models are improving at detecting these bugs, so they are being used to review changes, which, for the reasons you point out, they might be better at than people.

        RobotistryR 1 Reply Last reply
        0
        • RishavX Rishav

          @malstrom @pseudonym that’s an interesting claim. I don’t know enough about LLM research to make a judgement. I do know that LLMs trained on synthetic (other LLM-generated) data tend to perform worse, but have we reached the limits of what LLMs are capable of? In my limited understanding, if an LLM can “learn” fundamental programming “concepts” (the same way they can “learn” concepts across human languages — I could be wrong in my understanding here), they should (might?) be able to transfer/apply those concepts to not-before-seen domains (maybe with a bit of “reasoning” prodded in).

          Krzysztof SakrejdaW This user is from outside of this forum
          Krzysztof SakrejdaW This user is from outside of this forum
          Krzysztof Sakrejda
          wrote last edited by
          #20

          @xrisk @malstrom @pseudonym just for clarity, LLMs don't learn concepts

          1 Reply Last reply
          0
          • moinkM moink

            @pseudonym That and LLM code often looks very nice on the surface so it takes a lot of vigilance and thinking to find the subtle errors. Code from juniors tends to have more immediate signs of errors or wrong mental models.

            Krzysztof SakrejdaW This user is from outside of this forum
            Krzysztof SakrejdaW This user is from outside of this forum
            Krzysztof Sakrejda
            wrote last edited by
            #21

            @moink @pseudonym one of the benefits of people *having* a mental model

            1 Reply Last reply
            0
            • degenerating degenerateH degenerating degenerate

              @pseudonym It's certainly like that.

              FWIW though LLMs don't have any shame or feeling they need to manage their reputation.

              If you tell the same LLM that produced the report that it is now the QA manager and it must review the report from the standpoints of checking for missing or inaccurate citations, dubious claims or non-concise text, it will rat itself out and can be told to fix what it found.

              This is the same LLM entirely...

              nora 🐭 (she/her)N This user is from outside of this forum
              nora 🐭 (she/her)N This user is from outside of this forum
              nora 🐭 (she/her)
              wrote last edited by
              #22

              @hopeless @pseudonym you are suggesting that you can just layer more shit onto the shit and after enough layers of shit it becomes not shit.

              ⁂iwein⁂I 1 Reply Last reply
              0
              • Pseudo NymP Pseudo Nym

                If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                That's a cognitively brutal task.

                Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                I propose any productivity gains will be consumed by false negative review failures.

                DibsD This user is from outside of this forum
                DibsD This user is from outside of this forum
                Dibs
                wrote last edited by
                #23

                @pseudonym also, when the senior retires, who replaces them?

                1 Reply Last reply
                0
                • Pseudo NymP Pseudo Nym

                  If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                  That's a cognitively brutal task.

                  Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                  I propose any productivity gains will be consumed by false negative review failures.

                  MaxM This user is from outside of this forum
                  MaxM This user is from outside of this forum
                  Max
                  wrote last edited by
                  #24

                  @pseudonym This, %100. The Glass Cage by Nicholas Carr dives into this in depth with examples from aviation, and how full-automation of flight, makes it harder to recover from a disaster situation for pilots.

                  1 Reply Last reply
                  0
                  • Pseudo NymP Pseudo Nym

                    If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                    That's a cognitively brutal task.

                    Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                    I propose any productivity gains will be consumed by false negative review failures.

                    Deborah Preuss, pcc 🇨🇦D This user is from outside of this forum
                    Deborah Preuss, pcc 🇨🇦D This user is from outside of this forum
                    Deborah Preuss, pcc 🇨🇦
                    wrote last edited by
                    #25

                    @pseudonym @mayintoronto … and: there will be no juniors to grow into seniors. 😨

                    1 Reply Last reply
                    0
                    • eswillwalkerE eswillwalker

                      @avuko @pseudonym The main reason that machine learning works so well with material and protein design, weather forecasting, and such, is that there is good data available to “train” the model. The internet is the source of LLM training. It is full of garbage and LLMs are filling it with more garbage. The rule is the same as in 1970: GIGO (garbage in, garbage out). Only the scale is different.

                      Sir Dr Rusty o the Isle 🖤💛❤️R This user is from outside of this forum
                      Sir Dr Rusty o the Isle 🖤💛❤️R This user is from outside of this forum
                      Sir Dr Rusty o the Isle 🖤💛❤️
                      wrote last edited by
                      #26

                      @ELS @avuko @pseudonym Exactly this. The #AI_Slop is growing exponentially which in turn increases the slop bucket depth and size which in turn has already degraded the quality and validity of search engine results. Some estimates have put the accuracy and degradation at 20-35% *worse*. So having the exponential growth of #AI_Slop is in turn DEcreasing the accuracy and value of *search* exponentially as well. Doing all of that on *bigger and faster* machines and #LLMs will only hasten the processes in play and dramatically increase the probability of truly catastrophic outcomes and consequences.

                      And that is the case already in play, without bringing in all the issues raised in Bender and Hanna's recent book (mandatory reading)

                      Link Preview Image
                      The AI Con: How To Fight Big Tech's Hype and Create the Future We Want : Bender, Emily M.: Amazon.com.au: Books

                      The AI Con: How To Fight Big Tech's Hype and Create the Future We Want : Bender, Emily M.: Amazon.com.au: Books

                      favicon

                      (www.amazon.com.au)

                      My first encounter with so-called "artificial intelligence" was in 1964-5 as an undergrad psychology student in an (snail mail) exchange with one of the pioneer researchers at Stanford. I've been involved in parts of it and tracked it ever since. It is critical to understand that it has taken OVER 60 YEARS to get to the mediocre state we are now in. It didn't happen "yesterday" or even in "the last 2 years" as some snake oil #AI_Salesmen would have everyone believe.
                      Time to #BeCarefulWhatYouWishFor

                      And its now 2026...

                      1 Reply Last reply
                      0
                      • Pseudo NymP Pseudo Nym

                        If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                        That's a cognitively brutal task.

                        Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                        I propose any productivity gains will be consumed by false negative review failures.

                        The Psychotic Network FerretN This user is from outside of this forum
                        The Psychotic Network FerretN This user is from outside of this forum
                        The Psychotic Network Ferret
                        wrote last edited by
                        #27

                        @pseudonym We are using AI inexactly the worst ways possible.

                        Caveat: I am a never AI-er, due to the ethical issues surrounding how training data is gathered, the severe ecological and economic impacts, and the fact that deepfakes are objectively making the world a shittier place.

                        But pretend for a second, none of those are a problem anymore. We are still using AI wrong. You don't have it produce a mountain of code and have a human review it. You still use humans to produce the code, and have AI help other humans to review it. AI isn't terribly good at writing code, but it has been shown to be effective at finding a few classes of bugs humans are typically very bad at finding.

                        But that won't allow you to fire people and replace them with monkeys on typewriters, so it'll never happen.

                        ⁂iwein⁂I 1 Reply Last reply
                        0
                        • R Robin Adams

                          @pseudonym Especially since the sort of mistake that LLMs make is the sort of mistake that's hardest for humans to spot. They produce bad code that looks like good code, because they were trained on a lot of good code and told "Write code that looks like this".

                          ⁂iwein⁂I This user is from outside of this forum
                          ⁂iwein⁂I This user is from outside of this forum
                          ⁂iwein⁂
                          wrote last edited by
                          #28

                          @robinadams yes

                          I'm not sure if this is a but or an and...

                          The recent @squads blogpost by @EmmaDelescolle and @Tiziano notes that LLMs are good at reviews.

                          In an LLM friendly context, seniors will delegate shit work to LLM of course. So now we have the horrid situation where young coders don't learn coding, and senior teaching skills atrophy. I'm sure retrospectives on this are delegated to an LLM as we speak somewhere 🤪

                          Isn't this just the absolutely perfect shitstorm?

                          @pseudonym

                          1 Reply Last reply
                          0
                          • Pseudo NymP Pseudo Nym

                            If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                            That's a cognitively brutal task.

                            Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                            I propose any productivity gains will be consumed by false negative review failures.

                            JWcph, Radicalized By DecencyJ This user is from outside of this forum
                            JWcph, Radicalized By DecencyJ This user is from outside of this forum
                            JWcph, Radicalized By Decency
                            wrote last edited by
                            #29

                            @pseudonym - and by costs of false positives.

                            1 Reply Last reply
                            0
                            • nora 🐭 (she/her)N nora 🐭 (she/her)

                              @hopeless @pseudonym you are suggesting that you can just layer more shit onto the shit and after enough layers of shit it becomes not shit.

                              ⁂iwein⁂I This user is from outside of this forum
                              ⁂iwein⁂I This user is from outside of this forum
                              ⁂iwein⁂
                              wrote last edited by
                              #30

                              @nor4 @hopeless @pseudonym if hidden well enough, it's ok to step in it, right 🤪

                              1 Reply Last reply
                              0
                              • toldtheworldT toldtheworld

                                @pseudonym I have posed this conundrum before and the answer I received is that there is also an opportunity cost to not moving faster and the risk of a catastrophic bug may not outweigh the risk of being overtaken by competitors, especially since that was already happening before LLMs anyway.

                                Also, it *seems* models are improving at detecting these bugs, so they are being used to review changes, which, for the reasons you point out, they might be better at than people.

                                RobotistryR This user is from outside of this forum
                                RobotistryR This user is from outside of this forum
                                Robotistry
                                wrote last edited by
                                #31

                                @toldtheworld @pseudonym I didn't think I'd see the day when I'd want to ask CEOs "If all your friends jumped off a cliff, would you do it too?"

                                Overtaken by competitors how? How is it "overtaken by" when what is actually happening is "my competitors are introducing fundamental flaws into their business model that will completely vitiate it as a workable product so all I have to do is wait for them to fail"?

                                Apparently the free market doesn't turn people into money-making machines that build products other people want, it turns CEOs into lemmings. Who knew?

                                1 Reply Last reply
                                0
                                • The Psychotic Network FerretN The Psychotic Network Ferret

                                  @pseudonym We are using AI inexactly the worst ways possible.

                                  Caveat: I am a never AI-er, due to the ethical issues surrounding how training data is gathered, the severe ecological and economic impacts, and the fact that deepfakes are objectively making the world a shittier place.

                                  But pretend for a second, none of those are a problem anymore. We are still using AI wrong. You don't have it produce a mountain of code and have a human review it. You still use humans to produce the code, and have AI help other humans to review it. AI isn't terribly good at writing code, but it has been shown to be effective at finding a few classes of bugs humans are typically very bad at finding.

                                  But that won't allow you to fire people and replace them with monkeys on typewriters, so it'll never happen.

                                  ⁂iwein⁂I This user is from outside of this forum
                                  ⁂iwein⁂I This user is from outside of this forum
                                  ⁂iwein⁂
                                  wrote last edited by
                                  #32

                                  @nuintari what is AI?

                                  Reason I ask is that for everything containing the least bit of software I can find a techbro willing to confabulate an 'ai' themed pitch deck. I'm not even kidding.

                                  I surely hope to keep my dishwasher, if I promise not to call it 'ai' (but I'm sure someone else will) 😅

                                  The Psychotic Network FerretN 1 Reply Last reply
                                  0
                                  • ⁂iwein⁂I ⁂iwein⁂

                                    @nuintari what is AI?

                                    Reason I ask is that for everything containing the least bit of software I can find a techbro willing to confabulate an 'ai' themed pitch deck. I'm not even kidding.

                                    I surely hope to keep my dishwasher, if I promise not to call it 'ai' (but I'm sure someone else will) 😅

                                    The Psychotic Network FerretN This user is from outside of this forum
                                    The Psychotic Network FerretN This user is from outside of this forum
                                    The Psychotic Network Ferret
                                    wrote last edited by
                                    #33

                                    @iwein Sorry, I've taken to just using the term AI when I mean LLM, even though I actually mean "Almost Incompetent," in my own head.

                                    ⁂iwein⁂I 1 Reply Last reply
                                    0
                                    • Pseudo NymP Pseudo Nym

                                      If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                      That's a cognitively brutal task.

                                      Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                      I propose any productivity gains will be consumed by false negative review failures.

                                      Thomas H JonesF This user is from outside of this forum
                                      Thomas H JonesF This user is from outside of this forum
                                      Thomas H Jones
                                      wrote last edited by
                                      #34

                                      @pseudonym@mastodon.online

                                      Yesterday, I was working on some PowerShell-based automation. I'm a UNIX/Linux guy. I'm used to Bash. I'm used to Python and pythonic DSLs. I'm… You get the drift. I'm
                                      not a Windows guy and I'm not PowerShell guy.

                                      A few days ago, I got an email from Google telling me that, because I have a storage plan (mostly for photos storage), that use of Gemini was now included. So, I opted to try to use Gemini to bridge my PowerShell knowledge-gaps. I came to a couple conclusions:

                                      • If you're a
                                      truly junior "coder" (haven't mastered at least one "language" and regularly applied that master to "the real world), relying on LLMs is likely to lead you to creating smoking holes
                                      • Those "smoking holes" are the results of the LLM sometimes providing partially or wholly incorrect answers: I've had to correct Gemini several times
                                      • Even where "smoking holes" aren't a risk, LLMs are not adequately speculative. To illustrate, I was trying to solve a problem. Gemini suggested a given path to take. The suggested-path
                                      looked more generalizable, so I asked, "I feel like there's a good chance I can do similar within this other, very analogous component. I'm going to run a test to validate." Gemini's response was effectively, "don't bother: the documentation doesn't indicate that that will work." A couple decades' experience under my belt, I know that documentation is sometimes incomplete or wrong (out of date). So, I proceeded to test my suspicion and, lo and behold, it worked. If you're lacking "feel" for things, you'd likely take the LLM's "don't bother" guidance and go down a different path, a path that might be a lot more byzantine.

                                      1 Reply Last reply
                                      0
                                      • Pseudo NymP Pseudo Nym

                                        If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                        That's a cognitively brutal task.

                                        Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                        I propose any productivity gains will be consumed by false negative review failures.

                                        Wendy NatherW This user is from outside of this forum
                                        Wendy NatherW This user is from outside of this forum
                                        Wendy Nather
                                        wrote last edited by
                                        #35

                                        @pseudonym Yes. Very well put. I’m gonna use this …

                                        1 Reply Last reply
                                        0
                                        • Pseudo NymP Pseudo Nym

                                          If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                          That's a cognitively brutal task.

                                          Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                          I propose any productivity gains will be consumed by false negative review failures.

                                          Adrian MoralesA This user is from outside of this forum
                                          Adrian MoralesA This user is from outside of this forum
                                          Adrian Morales
                                          wrote last edited by
                                          #36

                                          @pseudonym Unless they're using LLM in aviation, nuclear, and radiology, who cares?

                                          1 Reply Last reply
                                          0

                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          • First post
                                            Last post