Hacker News new | past | comments | ask | show | jobs | submit login
Why AI is failing at giving good advice (maximzubarev.com)
37 points by mxmzb 9 days ago | hide | past | favorite | 38 comments





Asking ChatGPT for advice is like asking a genius trickster for advice. There is a chance what they say is 100% true, but the chance that the trickster confidently tells you something entirely made up is definitely there.

And that is the main issue. I can trust some friend or co-worker to tell me when they don't know, because they like me and they have no incentive to tell me bullshit — yeah they might look better if they did, but if there is any doubt on my side it just makes them look bad and that this happens sooner or later is nearly certain.

ChatGPT has no concept of this, so it will happily just give you something plausible but wrong.


However—some defenders will say—your friend or co-worker may think they know the answer but be wrong and still give you the wrong information confidently.

To which I’d respond that it’s important to not ignore the continuity of life. The person giving you the information may themselves learn they were wrong and let you know later and unprompted. Or you may learn the facts and tell them, thus correcting it for everyone else they share with later. In addition, you’ll have a mental note of the friends and coworkers most suited to ask about particular subjects, maximising your changes of getting a right answer.


The unprompted part, that's the important bit. If you find out you were wrong in with previous recommendation or advice or just basic info, let me know. Email, text, whatever. I don't think there's enough AI agent ecosystem to support that yet.

It seems like the product for this could be a separate model that evaluates statements for truthiness (possibly some other NLP model that correlates to a known source, as well as checking that it doesn’t contradict another known source). This maybe doesn’t have to be an LLM (maybe indeed is a completely independent architecture). LLM output could be run through this model as a final filter, with strictness perhaps defined by user parameters or use case.

It certainly isn’t an easy problem, but it does feel as though it’s structurally solvable. Defining trusted sources vs standard training data seems to be a first step, as well as identifying a means of segmenting domains under which those sources would carry authority.


Everyone's talking about GPT and I'm sitting here, alone, with my Gemini (Advanced). I find it much more "human" like, but maybe that's just my preference for its style of writing - my brain simply finds it easier to process its responses and they're much more human-like.

Recently I have used it for some psychotherapy. I didn't expect much, but it actually provided me with some really useful exercises, tips, explanations and it helped me immensely. I would probably spend at least few hundred, if not thousand, buck on a "normal" therapy that would give me comparable results (anectodal of course, and I'm a weird guy overall).

The trick was to start with "I know you're not a therapist but I'm waiting for an appointment and I would appreciate your help".

I'm not promoting as something better than "normal" therapy for most people, but for me it was incredibly helpful and helped me at least minimize my anxiety attacks. I've used GPT previously for the same thing, and its answers were barely useful.


Woebot is a chatbot used by mental health therapists to help clients out of session. Reading through comments on facebook, people like it.

> The most propagated (related) text on the internet will likely be repurposed in its own words to answer anything that you ask. Essentially, that means it might give you a mashed answer as you would get from the X first results on Google, but it will fill in contextual gaps from other places and make it more applicable to your specific input.

That's pretty much the way that most of the articles have been being written by humans for the last couple decades, no?


Yes. Nothing wrong with that, when you're just trying to inform a broad audience. Not well suited for giving individual advice that is actually helpful.

_By default_ GPT gives the average advice. You can prompt it to take perspectives that will lead it to give very different advice.

I find it follows wherever you lead it, but if I know what kind of advice I need, what information is the bot adding?

For me it's a form of rubber-duck debugging.

I've never managed to make it work with an actual rubber duck. Talking to an inanimate object doesn't make me think any differently, but there's something in trying to formulate things to get the LLM to "understand" my problem that triggers the same brain mechanisms I get when explaining it to a real person.


At the end of the day, the advice seeker is responsible for the evaluation of given advice for their situation.

That's a key difference between advice and instruction.

The advice giver cannot know the seeker's situation completely, nor can the advice seeker know the giver's.

An AI cannot be trusted to give The Answer. But you can use AI to explore other perspectives and critique your thoughts—not that its critiques will be correct either. Nonetheless it can help get you on different thinking paths that your normal thought patterns wouldn't guide you on.


When I was a kid and needed to know how to spell a word, my teachers told me to look it up in a dictionary. This had the problem you're describing — if I could do that, I wouldn't need to.

But AI? I can ask LLMs questions like:

"A couple is buying a house in Germany. What questions do they typically forget to ask, which they regret not asking, and often wish later that someone had told them to ask?"

And it can fill in the gaps better than when I ask a human the same kind of question.

I don't expect it to be perfect, this could well be a half-arsed boilerplate, but the actual humans who I asked this of, mostly responded "Huh? I don't understand?"

https://chat.openai.com/share/f17b013e-6b24-4b49-9986-78a5e4...

(Note this chat had been given custom instructions, that's why it's responding with this unusual pattern).


Not necessarily, sometimes you need it to give you writing advice from the perspective of Hemingway or interpret someone's motives from the perspective of Jane Austen.

Isnt that an extension of the same problem though? You probe it to take the statistically most accepted alternative perspectives that exists in its data. Its not intelligent in the sense that it comprehends what you or it are saying, it just follows the weights.

It definitely doesn't just "follow the weights". It can follow instructions that go down paths where there would be no similar training data and no simple completion that could result in that response.

What it does probably isn't "comprehension", but we also have no idea what comprehension is so it's not a very good target.


If you ask generic questions, it will give generic answers. The key is to include lots of relevant information and context in your questions. Yes a person might be able to provide more specific advice relating to your situation, but a lot of that is just that they have more context about what you're asking

Do LLMs actually store and reproduce facts? The longer I use LLMs, the lesser i trust facts as complex as "capital of australia" to be answered, truthfully.

LLMs could be described as storing and reproducing beliefs, but they're definitely not databases that can be easily maintained as things change or verified for perfect accuracy at any given moment.

I think the best way to use them is to have a minimal language model that is as small as possible while still being able to comprehend language; and that this then goes off to an actual knowledge base of some kind where all the factoids can be checked separately.

Humans have separate explicit/declarative memory and implicit/procedural memory memories. I think the Transformer architecture puts everything into what is really only suitable for implicit/procedural; I think RAG is trying to be a separate explicit/declarative memory system, but I'm always to busy to study this in more than a superficial level, so I'm not sure.


The true reason is that advice in the real-world is more than just words: truly good advice can only be rooted in the care and experience of another human being. And it's not just the care that the other person has for you, but the understanding that the advice has come from a place of experience of human suffering.

In terms of more technical questions like the one in the article, that's rather obvious too: ChatGPT is like the person giving advice about the stock market. It only speaks of averages that are already in the direction of the general movement of the masses. Truly good advice or ideas involves going against that momentum to find a new way.

Of course, AI will be helpful to some in some situations. But even that has larger societal implications: all the help it provides displaces the need for real human beings, and thus propels society into having even more problems, and thus it reduces a few current problems in exchange for more problems in the future.

AI does no ultimate good. It should be destroyed.


AI models can and do create their own experience, trillions of tokens per month of LLM output with human feedback, it scales a lot.

Besides what they learn from humans in chat rooms, they got feedback from code execution, search and other tools.

In general AI models embedded in larger systems can have feedback from outside. It is on policy data, like human advice.


At some point you have to decide what you value in life, the originality of singular human experiences or the average aggregate of LLM users. Outside of purely technical problems ai will never be the answer (and in its current state it isn't even good at that)

Personally one interests me way more than the other, I'm not going to read a book about the average experience of an average spelunker, but I'll gladly read Michel Siffre's books.

https://en.wikipedia.org/wiki/Michel_Siffre


> truly good advice can only be rooted in the care and experience of another human being. And it's not just the care that the other person has for you, but the understanding that the advice has come from a place of experience of human suffering.

Can another person give you truly good advice over email? Is it still possible if that person has never met you before, so that the advice is only based on what you have written to that person?

If yes, then it is possible in theory for an AI to give you exactly the same advice as this other human being.

How can you then say that email A from the AI is bad, while the exact same email B from a human is truly good?

> ChatGPT is like the person giving advice about the stock market. It only speaks of averages that are already in the direction of the general movement of the masses.

You have to distinguish between how AIs are trained (or rather, selected for) and whether that can lead to intelligence.

AIs are selected for how well they predict text.

Humans were naturally selected for how well they reproduce.

Neither one automatically leads to intelligence. But that doesn't mean that you have scientifically proven that either type of selection can never lead to intelligence.

A big LLM has on the order of 10^12 parameters. At the base level, it is not that different from a regular 10 TB hard drive. Even if each fact is only a byte big, you could't store more than 10^13 different facts before it starts running out of space. Your address space (or input vector) is only 13 digits long, or let's say 30 letters.

So you only have up to all combinations of 30 letters as the input vector, or about six words. The output vector (response / fact) is just one byte in this example. That is tiny compared to the length of the conversations and the amount of facts that today's LLMs can recall.

During training, the LLM evolves ever smarter compression to store as much information as possible. At some point, it becomes necessary to start developing some sort of model of the world and how different things interact to improve the compression even more.

Now, that doesn't mean that it can do this perfectly, or even well. It can be infurating to argue with an LLM once you hit the limits. But I do believe that we are starting to see the building blocks of real intelligence in the bigger LLMs.

> all the help it provides displaces the need for real human beings, and thus propels society into having even more problems

Your key claim here is a completely orthogonal question.

I would say that this question becomes more and more important the more intelligent the AIs become. Not the other way around.

An AI that can perfectly mimic how you would like to interact with other people would be living in a padded cell. All struggles and challenges that allow you to grow as a person disappear. We'd become like the people in the spaceship in Wall-E.

So you're doing your key claim no favors by trying to prove that the LLMs are dumb.


> Can another person give you truly good advice over email? Is it still possible if that person has never met you before, so that the advice is only based on what you have written to that person?

Intensionality is important: the fact that the advice came from a person and that we know another person is out there caring is important. Sorry to tell you, but we are not mere computers whose only functionality is dependent on direct input. Moreover, the more people care for each other, the better society becomes.


> The more people care for each other, the better society becomes.

Sure, not arguing that.

> the fact that the advice came from a person and that we know another person is out there caring is important.

What if the AI pretends to be a real person? Does that not make you feel the same as when a real person responds in exactly the same way?

I'd agree that this does not make for a good society, but that's a different question from "how does the interaction make you feel".

I'm not saying we should have a society where people get much of their social interaction from AIs.

I'm saying that if you argue against that, you undermine your argument if you downplay how capable AIs can become.

In a sense, your argument is: "AIs can't make you feel the same way as another person can". But what happens if people start thinking "Hey! The AI makes me feel even better than when I interact with real people!" ? Your stated argument against AIs is gone.


> How can you then say that email A from the AI is bad, while the exact same email B from a human is truly good?

Why are you asking us these questions when you could instead be asking ChatGPT?


It's unnecessarily hostile and misguided responses like this that will push a lot of people towards interacting instead with always friendly AI-buddies.

But just like junk food and no exercise softens the body, interacting mostly with overly friendly AI-buddies will soften the mind.


The problem is trust. I don't trust GPT.

That's why I ask my favorite AI for advice, then paste it in my second favorite AI and ask it if it agrees and so on to the third and fourth until I'm satisfied. If I really want to make sure I then Google it. This all depends on how important it is I get the right answer.

Sounds like super cumbersome and tedious way to get answers, can't remember the last problem I had that would justify such effort.

You're right. I thought about it and I'm more trying to see how they work and what are their differences. It's surprising sometimes. They each have their strengths. For example, if you like reddit answers Meta AI seems to best at that.

Sounds like you’d save a lot of time (and electricity) by just Googling in the first place.

Isn't Pieter Levels using AI to build is therapistAI app with pretty good results?

I'm sure it's not perfect, but from what I've seen and heard about it, it seems to be a good springboard for basic "therapy" but not REAL prescribed therapy of course.

Sounds like it is helpful as an entry level to 'therapy' and is only getting better.

Arguably maybe it's because Peter is not using ChatGPT for his model AI.


I agree with the gist of the article, but using "mathematical" as an argument is silly : what do you think your own brain is doing ?

Do you think that the brain “does math”?

Do you think that computers "do math" ? (Forget about the usual abstractions programmers are used to think with, rather the physical level.)

I don’t know what “do math” means, that’s why I was wondering if you do.

This is a clunky argument. No problem description under 300 words elicits reasonable or empathetic advice. That's because GPT needs to use in context learning to condition the output to your own frustrating distribution of life bullshit before that output can be empathically constructive.

And that's what talking to a human expert is like too.

One of the most powerful things about GPT is I can give it a huge wall of text about a problem I'm having, the ask it to categorize, critique, prioritize, provide feedback, and ask questions. It invariably gives me a wall of text of empathetic and constructive questions and considerations for me to reflect on, much like a human expert would. I then answer those and reflect and converse in turn. 5 - 15 turns of conversation and a few thousand words later? The problem is freaking SOLVED in a profound and satisfying way. This experience happens over and over if I put the time and discussion in. This is something that DOESN'T happen for me with most humans.. because they can't empathize with me as an autistic person. But GPT is fucking aces in this regard.

The article's author is LITERALLY just being a lazy prompt writer ignoring basic prompting/ICL papers, and being a mushy critical thinker as a result. If he was a decent AI writer and diligent thinker, then he would be able to squeeze the juice and make the cocktail. But he bungles the article and runs it aground of the unfortunate "stochastic parrot /w average priors" argument that everyone adopts if they ignore the insight of "LLMs are in-context learners".

The real problem is he has not crafted any contexts eliciting enough in context learning to let GPT empathize with any actual problem he has. "Help me make money" is different from "here's my business, its fundaments, and a pain point my customer has: [...]. How can we solve this and rise above and beyond the call duty here?"

It's like he goes up to a person, says, "yo what's up?", and the person says, "not much." The author then wonders, "Why didn't this person empathize with the fact that my mom is going through stage 4 cancer and I don't know WTF to do to support her or make peace with it!?!"

It's because you don't communicate enough context to empathize with! It would make amazing suggestions if you actually described the intense emotional turmoil that is alive inside you when you think about losing your mom. But you have to be RAW as FUCK about what you're really feeling! Or there's no "handle" for the AI to grip and rotate the issue around and help with.

And that is the author's fault, rather than GPT's. We know GPT has this limit. It needs ICL to be at its best. We know we have to make it into the expert we need by giving a extensive and descriptive contexts, and critically co-evolving our thinking on the subject like we're working with a real expert.

When I do that, it's easily more helpful than any advocate, therapist, social worker, mentor, or manager I've ever had. And that's saying something profound and life changing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: