Hacker News new | past | comments | ask | show | jobs | submit login
Pulumi AI is poisoning Google search results with AI answers (github.com/pulumi)
95 points by mooreds 17 days ago | hide | past | favorite | 57 comments



This is arguably more Google's fault than the content, obnoxious though it is. Google has been sacrificing search quality at the altar of search ads KPIs for a decade or more now and it shows.


It's totally Google's fault. The original PageRank algorithm was more resistant to low quality content - it was fundamentally about a reputation web of trust.

As Google shifted from valuing high quality sources referring to you to other signals, it became more gameable.

Note that the "high quality" part matters - it's harder to game getting, say, a link from real humans at a high quality place than it is to generate yet another SEO blogspam website with 10,000 backlinks from other SEO blogspam.


Links between sites stopped being as relevant to content quality ranking when the web became commercialized. Much of the high quality content (along with low effort junk) moved to commercial sites which mostly don't link to other sites. Or, if they do have external links it's because they're getting paid.


That's mainly Google's fault also, for running Adsense


I agree with the other comments that Google definitely has sacrificed search quality "at the altar of ad revenue KPIs".

However, I think your comments are fundamentally incorrect re "The original PageRank algorithm was more resistant to low quality content - it was fundamentally about a reputation web of trust. As Google shifted from valuing high quality sources referring to you to other signals, it became more gameable."

PageRank became unusable as a primary source of quality long ago due to stuff like link farms and the like. This is a classic example of Goodhart's Law, https://en.wikipedia.org/wiki/Goodhart%27s_law.

Again, I'm not saying Google couldn't have done better by focusing more on search quality and less on ad revenue, but the idea that the "reputation web of trust" can't be totally gamed by extremely motivated spammers is just flat out false in my opinion.


Right, but this gameability could have been improved leaning more into reputation scores across the pagerank-style calculation. Yes, this means some manual curation, which Google is allergic to, but it's also much cleaner long run.

I agree Goodhart's law is inevitable for Google search, but it's very hard to game "high reputation human cares about it". The high reputation human has to be bought out, but that should also eventually trigger a downrank of their reputation as a response.


How does Google determine what is high quality? That is in essence what they've changed.


The original PageRank algorithm wasn't set in stone. If it was, it would have become completely obsolete because it reflected the design of the 1990s internet. Back then, there weren't a million me-too sites on the same exact topic, all trying to capture the traffic of the #1 site in the segment. If you're looking for the Ultimate Beginner's Guide to Whatever In 2024, you have an embarrassment of riches to choose from. /s

PageRank also relied on backlinks as a quality signifier, assuming that more backlinks = more reputable. This has probably been the longest-lasting piece of the algorithm, as gaming this via "link-building" through ugly infographic embeds and blog post syndication on Medium.com, LinkedIn and Substack continues to be popular.


Yeah, I was going to say an alternate title: "Google returning low quality results for many searches."


Google is becoming unusable for programming related questions.

The telltale sign was when I have to actually put in "MDN" in the search to get MDN to show up.


> Google is becoming unusable for programming related questions.

> The telltale sign was when I have to actually put in "MDN" in the search to get MDN to show up.

That makes sense. It's not a sign that the search is poor, it's a sign that the search has been intentionally crippled.

It's clear evidence that google prioritises sites with adverts over sites with content. This could not have happened by accident.


For years I just used Google as the "I'm too lazy to search directly in Wikipedia" search box. Now I just properly use Wikipedia's search box. So thanks, AI?


you would think a company full of engineers working on web technology building a tool for finding info on the internet would understand and adjust for the fact that literally nobody on earth has ever wanted to go to w3schools, ever, at all, in the history of the entire universe.


Speak for yourself. The W3Schools site has a clean information architecture, short chapters on topics and working example code with a sandbox.

The MDN site on the other hand is a pure technical reference site, like PHP.net.

Considering that search works by way of matching search phrases to site text, it's not at all surprising that Stack Overflow and W3Schools rank higher than the reference manual sites.


No one who actually cares about the quality of the results is in charge. And for search results like the one given in the link (for "aws lightsail xray"), there are no ads, and thus the folks in charge of Google search these days probably prefer the results to be broken.


Like many companies that stop innovating, it's not the engineers who are in control. "Tonerheads" are. Raghaven has Search under his thumb. Ads is putting on pressure for revenue, and their incentives are not entirely aligned with Search.


I didn't know what "tonerheads" were. First Google search for the term gives a reddit link and enough of a summary to understand the meaning. Presumably people aren't trying to make money from that search...


Here's a YouTube video of a Jobs interview where the idea originated:

https://www.youtube.com/watch?v=Qo8zdPNMRdY


A while ago I started checking each highly ranked but low quality site in Google search for Google ads.

I'm never disappointed.

You can guess the answer for w3schools...

To be fair, a site like w3schools will worry much more about SEO than MDN, but I'm willing to bet that "shows our ads" has a positive impact on site ranking.

Low quality results that display their ads are the ideal scenario for Google:

1 user searches Google, sees some adds

2 user visits low quality site, sees more Google ads

3 user doesn't find useful information, goes back to search

4 repeat


Oh but of course! If the site shows Google ads then it must be a high-quality, trustworthy site, so you see they'd be irresponsible to not use it! (/s)


I think there are two unrelated causes to this:

1. Marginally cleaning up low-volume technical search results is not going to get anyone promoted or a stock refresher.

2. Most of the programmers at Google are probably good enough not to have to Google programming questions.

And agreed about w3schools, expert sexchange, etc.


> good enough not to have to Google programming questions.

This is the most hilarious concept I've ever heard.


...or they're good enough to know not to use Google for programming questions (and go directly to StackOverflow)?


Generative AI could have a paradoxical effect of making slow, human generated content more valuable because the only alternative is free meaninglessness generated from older meaningless free content. The free, machine generated content would converge to something hypnotic and more and more addictive. The meaningful would be seen as valuable enough for people to pay. There would be no in-between.


This has already started to happen for manufactured goods.

Mass-manufactured and cheap but low quality goods are extremely available. People also will pay a premium to some local craftsperson who makes artisanal stuff at an extremely noncompetitive price, often with less technical quality in some regard.

The real need is to have trusted brands and harder to game information about those companies, so you can more quickly tell when a company starts submitting to the MBAs and eroding their quality.


People also will pay a premium to some local craftsperson who makes artisanal stuff

I won't, unless it's for the art aspect of it or to help a specific developer/artist that I want to see succeed.

But by default, I'll pay a premium for high-quality very precise and reliable consumer goods that undergo strict quality assurance processes, mass produced or not. If artisanal stuff has less quality, I'm not inclided to pay a premium for it in the general case.


>I'll pay a premium for high-quality very precise and reliable consumer goods that undergo strict quality assurance processes

How? Every time I find something like this, Private Equity scoops up this company and strips it for parts by the time I want to buy more.


Another side effect is a further loss of privacy if only poisoned content can be searched without being logged into a service that has your full info/credit card.

GPT-(n) pollutes the freely searchable internet to shit while a subscription to GPT-(n+1) is sold as the solution.


The end-game of Google-style surveillance capitalism has always been that privacy becomes the exclusive purview of the wealthy. AI is accelerating the trend, but it didn't start us on this path.


It's not helpful to use the fact that the world already sucks to handwave a major force multiplier for the elements that will make it suck much worse much more suddenly.


Then allow me to be clear that I'm not trying to handwave anything: Google needs to be eliminated, along with the rest of the ad industry.


Advertising is not likely to go anywhere but we do have an opportunity to nurture significant skepticism about AI before it's too late and becomes as entrenched in the world as Google and social media.


You just used "the world already sucks" to wave away their point just like you criticized in the comment they were responding to. :(


Good catch but there is a consistent point I'm making. If the world sucks because of Google etc. then it's counterproductive to use that fact to downplay a new danger, and productive (in my opinion) to write it off so as to focus on a new danger that's still possible to avoid.


I've noticed the same too. I was experimenting with Pulumi therefore I had to lookup several times on Google. At first I appreciated that their documentation is based a lot on their LLM, however I started to figure out soon that it was plagued with the same issues with other LLMs, plus it was almost everywhere and no really way to escape it.

I ended up switching to Terraform because the quality of documentation was far superior.


Add "before:2023" to your search query


I want this idea fleshed out into a short story


I'm sure an LLM can accommodate your request.


This is the kind of pro tip I love seeing on HN. It's a shame that the tech field moves so fast and info gets outdated constantly, but at least for now, this helps.


The miasma cometh, followed swiftly by the death of the internet as it stands.

Authentication, verification, and curatorial products will be the next gold rush – and smart investors should be skating to where that puck is gonna be.


I wonder how many other companies are (quietly) guilty of the same thing.


Keep reading about stuff like this, I don't notice it in my day to day usage of Google at the moment honestly, are other people having the same experience, does this actually matter at all? Is it affecting Google in any meaningful way?


w3schools getting bumped down to number two source of outdated, inaccurate results. Impressive.


I think w3schools should still be ranked highly. MDN seems better organized for people who understand how web development works and just need good reference, but there are a lot of people, possibly many more than there are skilled developers, who don't understand web development but need to make something anyway.

Anecdotally, I watched someone learn front-end web development from scratch last year with a ton of googling. This person didn't understand the concepts at all (still doesn't) but w3schools seemed to help them the most to keep going and get their app made.

I suggested MDN but they found it to be confusing, not even in the "I'm not ready for this yet" sense, but in the "yeah this doesn't answer the question" sense. It did answer the question and they would've been greatly helped by taking the content seriously, but they were still far from knowing what they didn't know. I have to respect that kind of user needs to use google, too.


What are the issues with them in 2024? Like actual examples.

I don't do webdev but the answers never seemed completely wrong whenever I had to look something up and I know I've read praise in recent years about how they completely revamped and the old "wrong?" Information was fixed. I know some people like to recommend other docs over them but that's not saying they're wrong or bad, just not as good.


That's just the thing. You don't know if it was fixed or not, so you really can't trust the information given.

Their old PHP database material used to be utterly rife with SQL injection. Perhaps they've fixed some of that, but a single injection bug can ruin your day, so it's not something you can take as casually as they seem to.


It's really hard to fix a damaged reputation. And for a long time, the large majority of their information was horrible. It's going to take a rather long time for people to stop completely blacklisting them.


Why should we?

If you poison the well long enough, nobody's going to drink from it regardless of whether or not you can test the water and prove that it's clean.


Yeah, like sourceforge.

I just googled them and the top sub-result is for "AutoClicker download" which, I assume, is intended for click fraud. Luckily "Forge Auto Clicker comes with no ads or malware making an amazing user experience!"


It's not "wrong", it just has an awful signal/noise ratio.


Usually the answers were "right" in that they worked but also wrong in that they had issues or were otherwise less-than-ideal.

But I haven't used them in over a decade so lolidk


What is the benefit to Google of making their product continuously worse?

Is it so people will spend more time on Google and it allows Google to show more ads?

and/or

Does it prioritize site that Google has ads at instead of sites that matches searches.


This is 100% on Pulumi no? They had to make the intentional choice to make the AI results be indexable by search engines.


Pulumi engineer here. We did, as an experiment (and labeled it thusly), but we believed we’d have much more control over it than we have. After we began hearing these reports, we began taking them down. The vast majority have since been converted to return HTTP 410 with noindex directives, yet Google still hasn’t responded. It’s been frustrating for us also.


The experts-exchange of the new generation


There're no words for them. Manipulating SEO is equal to other scammers I've seen.


You can't poison poison.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: