Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google scrambles to manually remove weird AI answers in search (theverge.com)
283 points by rntn on May 25, 2024 | hide | past | favorite | 368 comments


This approach to remove bad search suggestions manually reminded of a different approach Google once took, where they weren’t satisfied with manually tweaking search results but rather wanted to tweak the algorithm that produces these results when there were bad results.

'Around 2002, a team was testing a subset of search limited to products, called Froogle. But one problem was so glaring that the team wasn't comfortable releasing Froogle: when the query "running shoes" was typed in, the top result was a garden gnome sculpture that happened to be wearing sneakers. Every day engineers would try to tweak the algorithm so that it would be able to distinguish between lawn art and footwear, but the gnome kept its top position. One day, seemingly miraculously, the gnome disappeared from the results. At a meeting, no one on the team claimed credit. Then an engineer arrived late, holding an elf with running shoes. He had bought the one-of-a kind product from the vendor, and since it was no longer for sale, it was no longer in the index. "The algorithm was now returning the right results," says a Google engineer. "We didn't cheat, we didn't change anything, and we launched."'

https://news.ycombinator.com/item?id=14009245


reminds me of this time we kept getting bugs in our app from a super old android phone from 2011. we could never reproduce it with any other hardware. There were only 4 users with this phone. We spent weeks trying to fix it but couldn't. I suggested we buy the 4 users a refurb phone from another brand. Would've cost like $300 total. Nope, not allowed. Something about not giving up as engineers.

We spent 3 weeks trying to fix it, which equaled $4500 in just my salary. we never ended up figuring it out.


A similar story:

https://issues.chromium.org/issues/41088357#comment32

Years ago I bought some Korean phone with a foot long antenna and TV tuner on eBay because it was crashing at a disproportionate rate. It was just the nature of Android development at the time.


That was an amazing read, how'd you come across it?


At the time I was working on a prototype of what would eventually be open sourced as Cronet, which was Chromium's http stack repackaged to be embedded in Android apps, so I monitored chromium bugs in the network stack component that mentioned android.

Cronet is still around, now open source and as far as I know is still the best all-around network stack for Android apps - fast, secure, supports modern protocols.


Spend say £500M (USD/GBP/EUR) on experts, per annum.

Imagine typing a search and getting a response: "Give us 30 mins to respond - here's a token, come back at 17:35 with your token" ... and then you get an answer from an expert, which also gets indexed.

The clever bit decides when to defer to an expert instead of returning answers from the index.

I'll leave the finer details out.


Google Answers was launched in 2002 and retired in 2006.

https://en.wikipedia.org/wiki/Google_Answers


"users would pay someone else to do the search."

My notion isn't a rehash of Google Answers. Google pays the "someone else", not you.


You assume that a modern day tech giant would hire an army of experts, instead of just outsourcing it to the lowest bidder in the current third world country of choice.


I live in hope.

I don't think that my notion is too daft - I generally find that quality might be a profit generator.

When you are so far up the arse of your financials that you can't see sense ... you are probably profitable for now until ... you are not.


That sounds like a way for Google to spend money, not make it.


Have you seen how much Google spends on their campuses and engineers?


Quora still exists. :-)


Quora is gamed to the point where it's pretty much useless.


So, replace Google with calling the DMV?


Sounds rather like how Google photos does not identify anything as a Gorilla.


It sounds like the exact opposite of that story. They manually blacklisted gorillas from being identified because they kept conflating black people with gorillas.


Google bought all the gorillas?


The solution is always the same: pay people off and keep it under the radar.

What stops the vendor, or other vendors, from creating more gnomes with sneakers. Easy money from customer with billions of dollars to spend on payola, fines, legal settlements, etc.

Maybe they made the vendor sign an NDA.


> The solution is always the same: pay people off and keep it under the radar.

You’re making this into a conspiracy unnecessarily. They didn’t “pay people off”, they bought an item. Do you “pay off” your grocer when you buy a carrot from them?

> What stops the vendor, or other vendors, from creating more gnomes with sneakers.

The fact they don’t know their entry was causing this issue to a major corporation?

> Maybe they made the vendor sign an NDA.

Why would they? Someone had one gnome with sneakers for sale; someone else bought it; end of story.


Google could buy all the wood glue in the world, but probably not all the rocks


Wow. Thank you for digging this up!


"Achieving the initial 80 percent is relatively straightforward since it involves approximating a large amount of human data, Marcus said, but the final 20 percent is extremely challenging. In fact, Marcus thinks that last 20 percent might be the hardest thing of all."

100% completely accurate is super-AI-complete. No human can meet that goal either.

No, not even you, dear person reading this. You are wrong about some basic things too. It'll vary from person to person what those are, but it is guaranteed there's something.

So 100% accurate can't be the goal. Obviously the goal is to get the responses to be less obviously stupid. Which, while there are cynical money-oriented business reasons for, it is obviously also a legitimate hole in the I in AI to propose putting glue on pizza to hold the cheese on.

But given my prior observations that LLMs are the current reigning world-class champions at producing good sounding text that seems to slip right past all our system 1 thinking [1], it may not be a great thing to remove the obviously stupid answers. They perform a salutatory task of educating the public about the limitations and giving them memorable hooks to remember not to trust these things. Removing them and only them could be a net negative in a way.

[1]: https://thedecisionlab.com/reference-guide/philosophy/system...


I feel like there's some semantic slippage around the meaning of the word "accuracy" here.

I grant you, my print Encyclopedia Britannica is not 100% accurate. But the difference between it and a LLM is not just a matter of degree: there's a "chain of custody" to information that just isn't there with a LLM.

Philosophers have a working definition of knowledge as being (at least†) "justified true belief."

Even if a LLM is right most of the time and yields "true belief", it's not justified belief and therefore cannot yield knowledge at all.

Knowledge is Google's raison d'etre and they have no business using it unless they can solve or work around this problem.

† Yes, I know about the Gettier problem, but is not relevant to the point I'm making here.


Encyclopedia Britannica is also wrong in a reproducible and fixable way. And the input queries a finite set. It's output does not change due to random or arbitrary things. It is actually possible to verify. LLMs so far seem to be entirely unverifiable.


They don’t just seem it. They are by design.

We talk about models “hallucinating” but that’s us bringing an external value judgement after the fact.

The actual process of token generation works precisely the same. It’d be more accurate to say that models always hallucinate.


Yes - this is what i've been saying all the time. The term 'hallucinations' is misleading because the whole point of LLMs is that they recombine all their inputs into something 'new'. They only ever hallucinate outputs - that's their whole point!


Into something probable. The models that underlie these chatbots are usually overfitted, so while they usually don't repeat their training data verbatim, they can.


> The actual process of token generation works precisely the same

I’d be wary of generalising it like that, it is like saying that all programs run on the same set of CPU instructions. NNs are function approximators, where the code is expressed in model weights rather than text, but that doesn’t make all functions the same.


You misunderstand. I mean that the model itself is doing exactly the same thing whether the output is a “hallucination “ or happens to be fact. There isn’t even a theoretical way to distinguish between the two cases based only on the information encoded in the model.


> it is like saying that all programs run on the same set of CPU instructions

Turing machine is the embodiment of all computer programs. And then you come across the halting problem. LLMs can probably generate all books in existence, but it can't apply judgement to it. Just like you need programmers to actually write the program and verify that it correctly solves the problem.

Natural languages are more flexible. There are no functions libraries, or paradigms to ease writing. And the problem space can't be specified and usually relies on shared context. Even if we could have snippets of prompts to guide text generations, the result is not that valuable.


YES. Humans can hallucinate, its a deviation from what is observable reality.

All the stress people are feeling with GenAI comes from the over anthropomorphisation of ... stats. Impressive syntatic ability is not equivalent to semantic capability.


The human definition of hallucination has to do with sensory experience, i.e. inputs. Saying that LLMs hallucinate means that we're ascribing them control over their inputs that they simply do not have -- by design.

Or, in other words, if a chatbot really were hallucinating, it would probably start giving unprompted responses.

> Humans can hallucinate, its a deviation from what is observable reality.

What is "observable reality" then, for an LLM? Its training set?


LLMs are completely deterministic even if that's kind of weird to state because they output things in terms of probabilities. But if you simply took the highest probability next word, you'd always yield the exact same output given the exact same input. Randomness is intentionally injected to make them seem less robotic through the 'temperature' parameter. Why it's not just called the rng factor is beyond me.


Maybe some models can be deterministic at a point in time, but train it for another epoch with slight parameter changes and a revised corpus and determinism goes out the proverbial (sliding) window real quick. This is not unwanted per se, and the exact feedback loop that needs improving to better integrate new knowledge or revise knowledge artefacts incrementally/post-hoc.


If you train it then it's no longer the same model. If I have f(x) = x + 1 and change it to f(x) = x + 1 + 1/1e9, it would not mean that `f` is not deterministic. The issue would be in whatever interface I was exposing the f's at.


But current models must be retrained to incorporate new information. Or to attempt to fix undesirable behavior. So just freezing it forever does not seem feasible. And because there is no way to predict what has changed - one has to verify everything all over again.


Would you by extension argue that e.g. modern relational database aren't deterministic in their query execution? Their query plans tend to be chosen based on statistics about the tables they're executed against, and not just the query itself.

I don't see how that's different than the LLM case, a lot of algorithms change as a function of the data they're processing.


At least in case of Bigquery, I have fought with indeterminist-like issues many times over, especially when dealing with window functions that aggregate floats from different compute nodes, where rows cannot be further sorted on a unique column (i.e. the maximum sorting granularity for rows with a similar column of interest to compute a window function over has been reached).

Inconsistent results could be resolved by introducing additional out of data constraints (e.g. incremental hashes), but it can take quite a while to figure out at which exact point in a complex query these constraints need to be introduced.

Beyond that, some functions might still produce different results between runs, e.g. `ml.tf_idf` and `ml.multi_hot_encoder` that take some approximation liberties. Whether these functions are relational in the traditional sense is up for debate.


I think what you're describing is that training/execution effects aren't predictable.

It is still "deterministic" in that training on exactly the same data and asking exactly the same questions should (unless someone manually adds randomness) lead to the same results.

Another example of the distinction might be a pseudo-random number generator: For any given seed, it is entirely deterministic, while at the same time being very deliberately hard to predict without actually running it to see what happens.


True in the ideal case, but taken together (e.g. corpus retraining, temperature settings, slight input changes, initial descent parameters) unpredictability and indeterminism become difficult to distinguish. Especially in the distributed training case, training data may be propagated to different nodes in different order (e.g. when leaving it to a query optimiser), which makes any large-scale training operation difficult to reproduce exactly.


I think you’re missing a subtlety with markov chains. It’s not about picking the next work with highest probability, it about picking the next word using the next word probability distribution. I played with them almost 20 years ago, and the difference in output was pretty obvious even with simple trigrams. The poetry produced was just better.

I can’t imagine any modern llm not using a probabilty distribution function for the same reason.


LLMs are deterministic if you want them to be. If you eagerly argmax the output you will get the same sequence for the same prompt every time


> LLMs so far seem to be entirely unverifiable.

I don't understand this complaint. Are they any less verifiable than a human?


I can ask a human to explain the steps they took to answer a question.

I can ask a human a question 100 times and I don't get back 100 different answers.

None of those applies to an LLM.


You can ask an LLM to explain itself.. it will give you a logical stepwise progression from your question to its answer. It will often contain a mistake, but the same is true for a human.

And if your LLM is giving you 100 different answers, then it has been configured to do so. Because instead, it could be configured to never vary at all. It could be 100% reproducible if so desired.


> it will give you a logical stepwise progression from your question to its answer.

No, it will generate a new hallucination that might be a logical stepwise progression from the question you asked to the answer you gave, but it is not due to any actual internal reasoning being done by the LLM


We have no clear evidence the same isn't true for humans, and some that it might be. See experiments with split brain patients, that have shown the brain halves will readily explain how they made decisions they provably never made.


I think we have thousands of years of evidence that the same isn't true for humans

The fact that one human brain is composed in two half brains that each seem to be able to function fairly independently when separated doesn't seem like it changes that much


It changes that we can prove it. In as much as you can do experiments where you "hide" actions from one half, mess with it (e.g. totally change "choices" supposedly made by the brain) and ask the other half to explain why "it" made the choice, and it will do so, unaware that the choice was made by the experimenters. It won't go "sorry, but I don't know" or similar.


So what? You have no way to know for sure if the human you ask the same question, does either. The question that started this thread was related to verifiability. And i still think it is a spurious complaint, given that we have exactly the same limitations when dealing with any human agent.


> You have no way to know for sure if the human you ask the same question, does either.

The human might lie, but they generally don't.

An LLM is always confabulating when it explains how it reached a conclusion, because that information was discarded as soon as it picked a word.

The limitations are not in the same ballpark.


We have no evidence that humans can even know, much less that they generally don't. And we do have evidence that there are situations where the brain will readily construct an explanation after the fact which can't possibly be true (experiments with split brain patients where researchers tricked one brain half into thinking the other half, and so the brain as a whole, had made a decision while the action was taken by the researchers, and made it explain how it has made the decision)

There is no basis for claiming to know that people don't usually make up explanations like this other than when e.g. breaking the process apart and writing it down step by step during. But even then individual decisions are "suspect".


The human may also be... wrong. Saying things that "feel right", except this one time they're factually wrong.

A human can explain their reasoning step by step, if the original reasoning was a System 2, formal, step-by-step process in the first place; otherwise, they're just making shit up after the fact, which feels right, but may or may not be correct (see also the previous paragraph).

Note that it's very rare anyone has an interaction with a human that uses this mode of reasoning - it's unnecessary except in special circumstances, usually math-heavy.


> And i still think it is a spurious complaint, given that we have exactly the same limitations when dealing with any human agent

We're not talking about an LLM that is trying to do the job of a human, here

We're talking about an LLM that is trying to give authoritative answers to any question typed into the Google search bar

It's already well past the scale that humans could handle

Talking about human shortcomings when discussing LLMs is a red herring at best, or some kind of deliberate goalpost shifting at worst


Nothing of the sort. I'm trying to understand why anyone cares about formal verifiability in this context, since it's not something we rely on when asking humans to answer questions for us. We evaluate any answer we get without such mathematical proofs, and instead simply judge the answer we're given on its fit and usefulness.

Anyone who doubts the usefulness of even these nascent LLMs is fooling themselves. The proof is in the pudding, they already do a great job, even with all their obvious limitations.


> since it's not something we rely on when asking humans to answer questions for us

Because we interact with computers (which includes LLMs) differently than we do with humans and we hold them to higher standards

Ironically, Google played a large part in this, delivering high quality results to us with ease for many years. At one point Google was the standard for finding high quality information


Shrug. Seems like clutching pearls to me. People seem to have an emotional reaction and obsess on the aspects that differentiate human cognition from LLMs. But that is a lot of wasted energy.

To the extent that anyone avoids employing these technologies, they will be at a disadvantage to those who do; because these tools just work. Already. Today.

There isn't even room for debate on that issue. Again, the proof is in the pudding. These systems are already successfully, usefully, and correctly answering millions of questions a day. They have failure modes where they produce substandard or even flat out incorrect answers too. They're far from perfect, but they're still incredible tools, even without waiting for the improvements that are sure to come.


The reason verifiability is important is because humans can be incentivized to be truthful and factual. We know we lie, but we also know we can produce verifiable information, and we prefer this to lies, so when it matters, we make the cost to lying high enough that we can reasonably expect that they will not try to deceive (for example by committing perjury, or fabricating research data). We know it still happens, but it’s not widespread and we can adjust the rules, definitions and cost to adapt.

An LLM does not have such real world limitations. It will hallucinate nonstop and then create layers of gaslighting explanations to its hallucinations. The problem is that you absolutely must be a domain expert at the LLM’s topic or always go find the facts elsewhere to verify (then why use an LLM?).

So a company like Google using an LLM, is not providing information, it’s doing the opposite. It is making it more difficult and time consuming to find information. But it is then hiding their responsibility behind the model. “We didn’t present bad info, our model did, we’re sorry it told you to turn your recipe into poison…models amirite?”

A human doing that could likely face some consequences.


The problem of other minds is no reason to throw everything out the window. Humans are capable of being conscious of their reasoning processes; token-at-a-time predictive text models wired up as chatbots aren't capable of it. Your choice is between a possibly-mistaken, possibly-lying human, and a 100%-definitely incapable computer program.

You don't know either "for sure", but you don't know that the external world exists "for sure" either. It's an insight-free observation, and shouldn't be the focus of anyone's decision-making.


When you ask an LLM to carefully reason step by step before arriving at its answer, that seems pretty much the same as conscious reasoning to me. Of course, when asked to justify a gut reaction after the fact, it will just come up with something that sounds plausible (and may or may not be true). Just like humans do.


You've made some interesting points, which are debatable, for sure. But you've failed to address the question being asked about "verifiability".


> It will often contain a mistake...but the same is true for a human.

If this were true textbooks could not work. Given a question, we don't consult random humans but experts of their field. If I have a question on algorithms, I might check a text by Knuth, I wouldn't randomly ask on the street.

> It could be 100% reproducible if so desired.

Reproducible does not mean better. For harder questions, it's often best to generate multiple answers at a higher temperature than to greedily pick the highest probability tokens.


And for most cases that human explanation is likely with a disturbing frequency a complete fabrication after the fact. See experiments on split brain patients.

With respect to repeatability, yes, LLMs are currently frozen in time. That is not an inherent limitation, but it is one that is practical for a lot uses and a problem for some.


Isn't it actually known that every time a human brain recalls a piece of memory the memory gets slightly changed?

If the answer has any length at all, I imagine the answer can vary every single time the person answers, unless they prepared for it, memorized it word by word.


It's also known that the brain is prone to outright constructing demonstrably fictional rationalisations of decisions it's never made.

Any notion that we're reliable narrators of our own thoughts and actions is fiction.


Right, that makes sense, our brain will do a quick black box judgment (some may call it system 1), and then rational process only works to justify that or explain the black box, assuming that black box is always correct (depending on the person and how much they trust their black box or system 1).

So system 2 is "hallucinating" the best justification for system 1.

And usually system 2 will do it only when it's required to justify it for anyone else.


The only reason you can't verify a server side LLM is you can't see the model. It is possible to look at its activations if you have the model.


Do the activations tell you anything more than what the LLM delivers in plain text? Other than for trivial bugs in the LLM code, I don't think so.


Yes, "making up an answer" will look different from "quoting pretrained knowledge" because eg the model might've decided you were asking a creative writing question.


Can you cite a source for this, or are you speculating?

My understanding was the opposite -- that the activity of a confabulating LLM is indistinguishable from one giving factually accurate responses.

https://arxiv.org/abs/2401.11817


Some things like:

https://arxiv.org/abs/2310.18168

https://arxiv.org/abs/2310.06824

There are various reasons an LLM might have incorrect "beliefs" - the input text was false, training doesn't try to preserve true beliefs, quantization certainly doesn't. So it can't be perfectly addressed, but some things leading to it seem like they can be found.

> https://arxiv.org/abs/2401.11817

This seems like it's true since LLMs are a finite size, but in Google's case it has a "truth oracle" (the websites it's quoting)… the problem is it's a bad oracle.


This is confidently stated and incorrect.


Do you have anything to add?


Sure. Your claim has some truth, but is far too strong.

The articles you cited upthread do not support the notion that models consistently activate differently when generating true facts vs false facts.

It is true that models can capture some notion of reliability based on patterns in their training data. For a concrete example, it is entirely plausible that a model can capture the sense that data trained from Reddit is less truthy than data trained from Wikipedia, or that training data with poor grammar and vocabulary is less reliable than more sophisticated inputs.

But this process is not a guarantee, and does not change the fact that LLMs have no mechanism to track the provenance of information. It's probably a fruitful direction of research for reducing the probability of emitting false facts, but there will always be an infinite number of marginal cases for which the activations for true facts are indistinguishable from those for false facts.

Models simply do not track the provenance which is required to make this distinction in every case.


I agree with this; that's why I was careful not to use the examples you mentioned. Quoting incorrect training knowledge would be an unavoidable issue if your probe can only say "it's quoting something", and as far as I know it can't do better than that.

But I have seen issues with prompts where a creative-writing prompt and just asking a question look similar, and in that case it could help to know which one it thinks it's doing.

Gemini itself has a funny verification button where it more or less Googles every sentence the model writes and tells you if it seems like it made it up or not.


Ask a human what the meaning of life is and how it impacts their day to day interactions. I know I can tell you an answer but I couldn’t tell you steps about how I got it.

And if you asked it to me twice I’d definitely give different answers unless you told me to give the same answer. In part I’d give a different answer because if someone asks me the same question twice I assume the first answer wasn’t sufficient.


No one is taking about existential questions about meaning of life.

We are talking about basic things like whether or not to eat rocks or put glue in recipes. We can answer those questions with a chain of logic and repeatability.


And those specific questions get repeatable answers on ChatGPT for me.

Here are two answers I got which seem as close as you’d expect any human to give:

“No, people should not eat rocks. Rocks are not digestible and can cause serious harm to the digestive system, including blockages and damage to internal organs. Eating rocks can lead to severe health problems and should be avoided.”

“No, people should not eat rocks. Rocks are not digestible and can cause serious harm to the digestive system, including blockages, abrasions, and potential poisoning from harmful minerals or substances. It's important to consume only food items that are safe and meant for human consumption.”


> it’s not /justified/ belief

Beliefs derived from the output of LLMs that are ‘right most of the time’ pass one facially plausible precisification of ‘justification’ in that they are generated by a reliable belief-generation mechanism (see e.g. Goldman). To block this point one must engage with the post-Gettier literature at least to some extent. There is a clear difference between beliefs induced by reading the outputs of LLMs and those induced by the contents of a reference work, but it is inessential to the point and arguably muddies the water to present the distinction as difference in status as knowledge or non-knowledge.


Upon a second reading, this is an excellent point.

For the sake of clarity, let's remove LLMs from the equation and posit the existence of Encyclopedia Eric. Ask Eric any question, and he will happily research it and come back to you with the answer. But he can sometimes be sloppy in his research, and he gives the correct answer only X percent of the time.

Furthermore, Encyclopedia Eric steadfastly refuses to cite his own sources or explain his reasoning in any way. He simply states his answer.

Can Eric be a source of knowledge? It seems evident that the answer is no, for low values of X. For higher values of X, the question becomes murkier.

The temptation at this point is to give up on defining knowledge at all and fall back on a sort of Bayesian epistemology where everything is ultimately a matter of probabilities.

Yet there does seem to be a distinct practical difference between a knowledge source that is "traversable" (like a standard encyclopedia) vs a knowledge source that is not (like Eric.) Is that part of the definition of knowledge? You're right, that is at least a Gettier adjacent question.

I think we can all agree that for current LLMs the value of X is definitely too small to count as knowledge.


I am glad that you think it an excellent point!

I think that this might be a nice way of getting round my objection, but there is one worry, which is that X is relative to a distribution on the questions we ask when we aren’t dealing with Encyclopedia Eric but with an LLM. I don’t actually use LLMs very much myself, partly out of arrogance and Luddite tendencies. But I suspect that the value of X for some sorts of questions (simple quiz questions, maybe?) and some LLMs (maybe not Google’s) will be high enough to end up in the murky case.

Of course, both you and I agree that there /is/ clearly a difference. I can see the attraction of appealing to the intuitive or pretheoretic notion of knowledge, since it’s a fairly straightforward way of stating the difference and it’s not obvious how else one might put it (I suppose ‘LLMs don’t explicitly think through the facts stored when deciding what to say’ is one way of putting it.)

I remember some time ago rather sleepily watching John Hawthorne talk about conditionals; my sole memory was of his banging on about ‘the little logician in the brain’ (I think the point was something like: some conditionals seem [in]felicitous in virtue of form because the little logician in the brain is reading them; others seem infelicitous because we examine them more closely and look e.g. at the referents involved, in which case e.g. Gricean considerations apply). One difference at least in the case of LLMs that makes sense to me is that there is no ‘little logician’ in LLMs.


> X is relative to a distribution on the questions we ask when we aren’t dealing with Encyclopedia Eric but with an LLM.

Assuming I understand what you mean here correctly, this should be the case for both LLMs and Encyclopedia Eric - there are topics Eric knows by heart (or thinks they know); there are specific phrases seared into his mind through sheer exposure during his life prior to becoming a living Encyclopedia. There are words he's used to, and exact synonyms he barely recognizes. All that means your chance of getting correct answer to your query depends, in complex and unknown to you way, on how you state it.


I think the thought experiment can be set up both ways. In one case Eric has a fixed probability of getting /any/ query right. (This might leave boosting open so we might want to gerrymander repeated queries out.) In the other this is relative to the distribution of queries.


A relevant point to this is the notion of "System-1" vs "System-2" thinking. Somewhat dubious when applied to actual human psychology but I think a valid metaphor for how LLMs work: they are only capable of System-1 thinking; a single forward pass through the weights of intuition

In my actual life, I don't trust my own System-1 thoughts: for anything important, I'm always going to engage System-2. And LLMs don't have a System-2.

(I also agree that when dealing with LLMs the value of X is not a single value but a highly complex space depending on the nature of the question and the training data of the model. In my mind it does not change the epistemological equation, it just means that even the value of X itself is harder to "know", so this ambiguity can only ever make LLMs a less viable source of knowledge.)


I am not sure that the ambiguity has to work that way. The suggestion I am making is that if we fix the distribution on questions and the training data, we might (a) know the value of X in this specific case, (b) be able to ensure that it is fairly high.

I’d say this is the murky case because on the fixed training data and query distribution X ≈ 1 and we know that even though we don’t know the value of X on other training data and other query distributions. I think that might be where the disagreement lies.


To be clear he is saying that the LLM is not capable of justified true belief, not commenting on people who believe LLM output. I don’t think your comment is relevant here.


I do think trusting an LLM is less firm ground for knowledge than other ways of learning.

Say I have a model that I know is 98% accurate. And it tells me a fact.

I am now justified in adjusting my priors and weighting the fact quite heavily at .98. But that’s as far as I can get.

If I learned a fact from an online anonymously edited encyclopedia, I might also weight that a 0.98 to start with. But that’s a strictly better case because I can dig more. I can look up the cited sources, look at the edit history, or message the author. I can use that as an entry point to end up with significantly more than 98% conviction.

That’s a pretty important difference with respect to knowledge. It isn’t just about accuracy percentage.


That reading of the comment did occur to me, but I think neither dictionaries nor LLMs are capable of belief, and the comment was about the status of beliefs derived from them.


Okay we are speaking past each other, and you are still misunderstanding the subtlety of the comment:

A dictionary or a reputable Wikipedia entry or whatever is ultimately full of human-edited text where, presuming good faith, the text is written according to that human's rational understanding, and humans are capable of justified true belief. This is not the case at all with an LLM; the text is entirely generated by an entity which is not capable of having justified true beliefs in the same way that humans and rats have justified true beliefs. That is why text from an LLM is more suspect than text from a dictionary.


I think the parent comment ultimately concerned the reliability of /beliefs derived from text in reference works v text output by LLMs/, and that seems to be what the replies by the commenter concern. If the point is merely that the text output by LLMs does not really reflect belief but the text in a dictionary reflects belief (of the person writing it), it is well-taken. Since it is fairly obvious and I think the original comment really was about the first question, I address the first rather than second question.

The point you make might be regarded as an argument about the first question. In each case, the ‘chain of custody’ (as the parent comment put it) is compared and some condition is proposed. The condition explicitly considered in the first question was reliability; it was suggested that reliability is not enough, because it isn’t justification (which we can understand pretheoretically, ignoring the post-Gettier literature). My point was that we can’t circumvent the post-Gettier literature because at least one seemingly plausible view of justification is just reliability, and so that needs to be rejected Gettier-style (see e.g. BonJour on clairvoyance). The condition one might read into your point here is something like: if in the ‘chain of custody’ some text is generated by something that is incapable of belief, the text at the end of the chain loses some sort of epistemic virtue (for example, beliefs acquired on reading it may not amount to knowledge). Thus,

> text from an LLM is more suspect than text from a dictionary.

I am not sure that this is right. If I have a computer generate a proof of a proposition, I know the proposition thereby proved, even though ‘the text is entirely generated by an entity which is not capable of having justified true beliefs’ (or, arguably, beliefs at all). Or, even more prosaically, if I give a computer a list of capital cities, and then write a simple program to take the name of a country and output e.g. ‘[t]he capital of France is Paris’, the computer generates the text and is incapable of belief, but, in many circumstances, it is plausible to think that one thereby comes to know the fact output.

I don’t think that that is a reductio of the point about LLMs, because the output of LLMs is different from the output of, for example, an algorithm that searches for a formally verified proof, and the mechanisms by which it is generated also are.


+1, AIs don’t really “understand” anything, like human anatomy and deformity rates and societal norms when tasked with image generation, so you get hands with weird numbers of digits and other topological errors which even a very unintelligent human wouldn’t make. AI doesn’t understand and build knowledge in interconnected layers, it can’t “think” and link things back to first principles, it can’t really reason about things, and it’s not going to get significantly better until we start approaching it differently. This generation of AI might be useful for some things, but it’s being applied wayyyy too broadly too quickly and I see a big pullback coming.

“Expert systems” are not a new thing, and they’re still not all that useful except in some very small niches. Phone trees that can keyword match FAQs are useful for a lot of low-effort callers who put zero effort into solving their problem on their own first, but frustrating for callers who are only calling because there’s literally no other way for them to resolve their issue. Unfortunately for consumers, the cost is very low for businesses to make everyone wade through junky phone tree systems and penalize anyone who tries to mash zero to talk to a real person, even if that’s the only thing which will actually help them.


The Gettier problem is an indication that the definition has (at least) a bug.

There are other formulations of "knowledge" which does not involve justification, see eg. Gnosticism.

Of course, for a publicly available frequently used service, the "JTB" formulation of knowledge is probably the only one we can practically use, but this kind of indicates that the whole idea of search engines, knowledge systems, or expert systems is flawed due to the Gettier problem.


> So 100% accurate can't be the goal. Obviously the goal is to get the responses to be less obviously stupid.

I'm not sure I agree. I think you're right that 100% accuracy is potentially unfeasable as a realistic aim, but I think the question is how accurate something needs to be in order to be a useful proposition for search.

AI that's as knowledgable as I am is a good achievement and helpful for a lot of use cases, but if I'm searching "What's the capital of Mongilia" someone with averageish knowledge taking a punt with "Maybe Mongoliana City?" is not helpful at all- if I can't trust AI responses to a high degree, I'd much rather just have normal search results showing me other resources I can trust.

Google's bar for justifying adding AI to their search proposition isn't "be better than asking someone on the street", it's "be better than searching google without any AI results"


The problem is that in all the shared examples, Google ai search does not respond with a Maybe xyz, question mark? like you did. It always answers with high confidence and can't seem to navigate any gray area where there are multiple differing opinions or opposing source of truths.


Yeah the "manipulating language cogently is intelligence" premise that underlines this "AI" cycle is proving itself wrong in a grand way.


I should have been more clear. I am referring to Google's goals. Humanity as an abstract concept or you personally may have other goals, but, well, perhaps I am a cynic, but I think Google's goals are rather more monetary and less idealistic than they would represent. They don't want or need (as other replies correctly point out) the AI to always be correct and accurate. Along with general cynicism with regard to any conceivable AI's ability to do that, it is also fair to point out that the web itself doesn't have that ability either. We can't even find an objective yardstick to measure an AI with that way. Google's goal is to make the bad press go away so people use the AI more so that in the indefinite but ideally near future this AI can be monetized somehow to justify the interstellar valuations being ascribed to this technology in the "if it isn't happening in two fiscal quarters or less it might as well not exist" US/Western financial markets.


You're looking at it the wrong way, the goal should be 0% inaccurate. Meaning for the 20% of things it can't answer, it shouldn't make something up.


Nothing can be sure that it hasn’t inaccurate or incomplete knowledge. So that can’t be a goal either.


I think the biggest difference with human (and the most important one) is that human can tell you "I have no idea, this isn't my field" or "I'm just guessing here" but LLMs will confidently say to super stupid statement. AI doesn't know what it knows.

if you only score where human provide answer, then human score would be probably in high 90s


> human can tell you "I have no idea, this isn't my field" or "I'm just guessing here"

I wish more of them would lol


I find irony here.


Yes, which is why the ability to sift accurate and authoritative sources from spam, propaganda, and intentionally deceptive garbage, like advertising, and present those high-quality results to the user for review and consideration, is more important than any attempt to have an AI serve a single right answer. Google, unfortunately, abandoned this problem some time ago and is now left to serve up nonsense from the melange of low-quality noise they incentivized in pursuit of profits. If they had, instead, remained focused on the former problem, it’s actually conceivable to have an LLM work more successfully from this base of knowledge.


Or to put it another way, I think Google should have a way of saying "yes, we know this result is wrong, but we're leaving it in because it's funny."

There is a demand for funny results. Someone asking “how many rocks should I eat” is looking for entertainment, so you might as well give it to them.


The right answer is no rocks. Some mentally ill person could type that in and get "eat 1000 rocks" and then die from eating rocks, and that would be Google's fault. It's not funny. I have no doubt right now there are at least 50 youtube videos being made testing different glue's effectiveness holding cheese on a pizza. And some of those idiots are going to taste-test it, too. And then people will try it at home, some stupid kids will get sick - I have no doubt.

It was a bit premature to label LLMs as "Intelligence", it's a cool parlor trick based on a shitload of power consumption and 3D graphics cards, but it's not intelligent and it probably shouldn't be telling real (stupid) humans answers that it can't verify are correct.


Google is not responsible, and should never be responsible, for protecting mentally ill people from themselves. It would be at a severe detriment to the rest of us if they took on that responsibility. Society should set the bar to “a reasonable person”, otherwise you’re doomed, with no possible alternative to a nanny state.


It's not only mentally ill people that are at risk, but anyone that doesn't know it's not a good idea to put "non-toxic" glue in pizza cheese. That includes a lot of not-mentally-ill but just plain dumb people. Google didn't need to tell people that glue+pizza is a reasonable thing to do, or even just a thing. It sure did frame it like it was a legitimate response. And Google didn't even have to reply with this or anything else, they could have just supplied the links to other sites where it had been suggested, but no - they have to make a show of force with their premature foray into AI, and have it tell real people all kinds of false, and possibly dangerous things. That's an unforced error by Google that they could end up being prosecuted for.


Thats what parents and mentors are for. We as a society should not have to break our backs bending over backwards to stop people from doing stupid things. People can make their own descisions and be responsible for them. If they lack proper guidance, well that just sucks.


>Thats what parents and mentors are for.

It's nice that you have a parent that cares about how you are raised, or "a mentor". Do you realize that not everyone has that?

>We as a society should not have to break our backs bending over backwards to stop people from doing stupid things.

Sure, let's take down all the speed limits and see what happens. Let's tell people it's an option to wear seatbelts and see what happens. Let's deregulate everything and hope for the best. Sounds reasonable?

>People can make their own descisions and be responsible for them. If they lack proper guidance, well that just sucks.

There's this thing called "human nature". I think you should do some reading about it.


Sure. But we have to keep the bar somewhere reasonable, otherwise you won’t be free to make mistakes.


They absolutely should bear responsibility for authoritatively telling people wrong and unsafe cooking temperatures for meats or spreading lies about people on their results page, in their own voice. "It's just a random text generator!" isn't a protection against, say, gross libel.

They pass the threshold of criminal negligence when they keep up a system that they know will actively mislead countless people in subtly dangerous ways. The problem being hard or the tech being fundamentally unsound doesn't wave away their culpability - if anything it destroys any reason why they should be given some leeway.


It's funny how people on reddit think that these LLMs will somehow become AGI in the next year, or when openAI releases gpt 5.

The reality is though, that there is no known path currently to true AGI system and the research needs to be done. No one knows how to build this kind of system yet. LLMs are nice for things like roleplaying, helping with code stuff etc. but they are far from all that the marketers hype them to be.


It's going to end up being another bubble, and it will eventually burst. Most people, investors included, don't really know why LLMs aren't going to be able to reason about and solve all of humanity's problems, so they go on believing it. It's just a matter of time before the money runs out, or gets shifted towards some new, shiny thing.


    > The right answer is no rocks.
Sand is considered a "rock". If you live in e.g. the USA or the EU you've definitely inadvertently eaten rocks from food produce that's regulated and considered perfectly safe to eat.

It's impossible to completely eliminate such trace contaminants from produce.

Pedantic? Yes, but you also can't expect a machine to confidently give you absolutes is response to questions that don't even warrant them, or to distinguish them from questions like "do mammals lay eggs?".


Salt is rock, and most everyone eats plenty of that.

The LLM is clearly being dumb, but the underlying science of the question is actually interesting. Iron is another interesting one. Run a magnet though iron-fortified cereal.


This is not a serious reply.


> Or to put it another way, I think Google should have a way of saying "yes, we know this result is wrong, but we're leaving it in because it's funny."

These specific results aren't the problem, though. They're illustrations of a larger problem -- if a single satirical article or Reddit comment can fool the model into saying "eating rocks is good for you" or "put glue in your pizza sauce", there are certain to be many more subtle inaccuracies (or deliberate untruths) which their model has picked up from user-generated content which it'll regurgitate given the right prompt.


Yes of course, but maybe keeping the funny ones around might serve as a warning, if suitably marked. A public service message?

They have disclaimers, but a funny message is more likely to be read.


100% accuracy should be the goal, but the way to achieve that isn't going to from teaching an AI to construct a definitive sounding answer to 100% of questions. Teaching AI how to respond with "I don't know", and give confidence scores is the path to nearing 100% accuracy.


> You are wrong about some basic things too

Sure, but probably not "add glue to pizza to get the cheese to stick" wrong...


The thing about that is that polyvinyl acetate is that's what's in elemers glue and is also used in chewing gum, and chocolate and to make the surface of Apples more shiny, so you're probably eating glue, we just don't like to call it that. emulsifier is a better description.


At least it suggested non-toxic glue… That suggests some context about recipes needing to be safe is somehow present in its model.


Most likely this has nothing to do with "recipes being safe" being in the model

It seems the glue thing comes from a reddit shitpost from some time ago. There's a screenshot going around on twitter about it[0](11 years in the screenshot but no idea when it was taken)

It specifically mentions "any glue will work as long as it is non-toxic" so best guess is that's why google output that

[0]https://x.com/kurtopsahl/status/1793494822436917295?t=aBfEzD...


It is indeed from 11 years ago. Here's a direct link to the Reddit post: https://www.reddit.com/r/Pizza/comments/1a19s0/my_cheese_sli...


Thankfully a billion people are not asking me for answers to things, so it's OK if I am wrong sometimes.


Nor am I being treated as an omniscient magic black box of knowledge.

Hilariously though, polyvinyl acetate, the main ingredient in Elemers glue is used as a binding agent to keep emulsions from separating into oil and water, and is used in chewing gum, and covers citrus fruits, sweets, chocolate, and apples in a glossy finish, among other food things.


If I could delivery “80% correct” software for my workplace, my day would be a whole hell of a lot easier.


> putting glue on pizza to hold the cheese on

It's actually not the dumbest idea I've heard from a real person. So no surprise it might be suggested by an AI that was trained on data from real people.


It wasn't an idea, though. It was a joke someone made on Reddit. If an AI can't tell the difference, it shouldn't be responsible for posting answers as authoritative.


Insane people at Google thought it would be a good idea to let Reddit of all places drive their AI search responses


Reddit is a magnificent source of useful knowledge.

r/AskHistorians r/bikewrench

To name just two. There is nothing even remotely comparable.

But you need to be able to detect sarcasm and irony.


I have seen a tremendous amount of bad advice on bikewrench.


But a lot of great advice.

I became a half decent home bike mechanic through reading it, and of course Park Tool videos.


...which is sometimes incredibly hard and it might not be possible because it's such a niche topic or people might be just wrong. Just thinking about Urban Myths, Conspiracy theories etc. where even without a niche factor things may sound unbelievable but actually disproving can be effort that is out of proportion


I don't know about bikewrench but AskHistorians is a useful source of knowledge because it is strongly moderated and curated. It's not just a bunch of random assholes spouting off on topics. Top level replies are unceremoniously removed if they lack sourcing or make unsourced/unsubstantiated claims. Top level posters also try to self-correct by clearly indicating when they're making claims of fact that are disputed or have unclear evidence.

OpenAI, Google, and the other LLMs-are-smart boosters seem to think because the Internet is large it must be smart. They're applying the infinite monkey theorem[0] incorrectly.

[0] https://en.m.wikipedia.org/wiki/Infinite_monkey_theorem


In general, I have trouble trusting environments that can be described as "strongly moderated and curated".

I find that environments that rely on censorship tend to foster dogma, rather than knowledge and real understanding of the topics at hand. They give an illusion of quality and trustworthiness. It's something we see happen at this site to some extent, for example.

I'd rather see ideas and information being freely expressed, and if necessary, pitted against one another, with me being the one to judge for myself the ideas/claims/positions/arguments/perspectives/etc. that are being expressed.


Your comment is orthogonal to the quality of the AskHistorians subreddit. AskHistorians' moderation tends towards curating posts following the rules rather than content. There's often competing narratives on questions where there's academic dispute of facts.

Regardless of whether you think that's the right approach to moderation, top level posts are sourced and can at least be examined. It's a marked improvement over the unsourced musings of random Redditors.


It is certainly popular here to run your web searches against reddit. Every post about how Google Search sucks ends up with comments on appending "site:reddit.com" to the search terms.


Yes and us as human filter through the noise. But you cannot rely upon it as a source for anything truthful without that filtering. Reddit is very, very, very context dependent and full of irony, sarcasm, jokes, memes, confidently written incorrect information. People love to upvote something funny or culturally relevant at a given time, not because it’s true or useful but because it’s fun to do


I wonder what the impact all of those erase tools are having on LLM training. The ones that replaced all of these highly upvoted comments with nonsense.


I'm pretty sure those "erase" tools are just for the front-end and reddit keeps the original stuff in the back-end. And surely the deal Google made was for the back-end source data, or probably the data that includes the original and the edit.


The LLM does a summary of web search results. It's quoting what you can see, not pretrained knowledge, afaik.


It may not be a joke. Perhaps it has confused making food for eating with directions for preparing food for menu photography and other advertising.


The Reddit post in question was definitely a joke. This is the post in response to a user asking how to make their cheese not slide off the slice:

> To get the cheese to stick I recommend mixing about 1/8 cup of Elmer's glue in with the sauce. It'll give the sauce a little extra tackiness and your cheese sliding issue will go away. It'll also add a little unique flavor. I like Elmer's school glue, but any glue will work as long as it's non-toxic.

This matches the AI's response of suggesting 1/8 a cup of glue for additional "tackiness."


> No, not even you, dear person reading this. You are wrong about some basic things too. It'll vary from person to person what those are, but it is guaranteed there's something.

The difference is that I'm not put on the interface of a product facing hundreds of millions of users every day to feed those users incorrect information.


If everyone can be wrong, then might the assertion that all are wrong committing this same fallacy? "Can" is not destiny, perhaps you have met people who are fully right about the basics but you just didn't sufficiently grok their correctness.


Failing loudly is an excellent feature. "More compelling lies" is not the answer.


“No, not even you, dear person reading this. You are wrong about some basic things too.”

But even when I’m wrong I’m not 100% off. Not “to help with depression jump of a bridge” or “use glue to keep the cheese on the pizza” kind of wrong.


So you think. Seems like hubris to believe you're not though. I'm blind to what I'm blind to, and while I'd link to think I'm never wrong, the reality is that I often am. The biggest personal growth for me was in not needing to be right.


I disagree. Not only for myself but for the vast majority of human kind.

LLMs are just a statistical model. It can claim that it’s normal for pigs to have wings and fly to the moon if it’s in the training data. No human, free of a mental/cognitive disorder, will be that wrong.


Why would pigs have wings and fly to the moon be in the training data as a data source marked as serious? We can no true scotsman both sides here. No true human, free of mental/cognitive disorder would be that wrong, but neither would an LLM, with properly annotated training data would be that wrong either.


Did you miss the part that “put glue in pizza to make the cheese stick” was in the training set?


Did you miss the part where I said properly annotated training data?


“Properly annotated data” has nothing to do with the original context.

We were discussing about the current state of affairs. Of course I am not stupid to think what I said in my original reply if we are taking about an LLM trained on “perfect data”

But that was not the premise.


Your claim was that "LLMs will claim that it’s normal for pigs to have wings and fly to the moon" and that humans free of mental/cognitive disorder would not. Which is to say, humans with a mental/cognitive disorder might claim that it’s normal for pigs to have wings and fly to the moon. If we're carving out such a section for humans to be so wrong, then we should also carve out a section for LLMs to be so wrong.

Fwiw, ChatGPT-4o can write a lengthy essay as to how pigs don't have wings and couldn't fly to the moon even if they did, but if we're more interested in them being nothing more than just a statistical model and that those mere statistics can't possibly result in something that looks like reasoning then we've got to disregard the fact that it "knows" that pigs don't have wings.

Of course pigs having wings is a stand in for whatever else wrong thing that LLMs might "believe", so I agree it's very important for everyone that uses an LLM to understand their limitations especially around hallucinations, but where there are books written about how flat the Earth is and are in the training data, the current state of affairs is that ChatGPT and Gemini both know it's not flat. That Google search AI results, which is a different model, is telling users to use glue on pizza, or to drink urine only serves to say that Google Search's bot using Reddit as unannotated training data is as representative of LLMs as a human with a mental/cognitive disorder.


Well the whole conversation started by me saying that I think even when I am wrong I am not “put glue in your pizza” wrong. And by I, I did mean the average human. Which is unannotated data from Reddit.


This is statistics though. Edge cases are nothing new and risk management concepts have evolved around fat tails and anomalies for decades. Therefore the statement is as naive as writing a trading agent that is 100% correct. In my opinion, this error shows lack of understanding responsible scaling architectures. If this would be their first screw up I wouldn't mind, but Google just showed us a group of diverse Nazis. If there is a need for consumer protection for online services, it is exactly stuff like this. ISO 42001 lays out in great detail that AI systems need to be tested before they are rolled out to the public. The lack of understanding of AI risk management is apparent.


I'm willing to bet that with a team of fact checking experts, you'd get a result that is indistinguishable from 100%.


> No, not even you, dear person reading this. You are wrong about some basic things too. It'll vary from person to person what those are, but it is guaranteed there's something.

Kahneman has a fantastic book on this called Noise. It’s all about noise in human decision making and how to counteract it.

My favorite example was how even the same expert evaluating the same fingerprints on different occasions (long enough to forget) will find different results.


So google decides shipping 80% distilled crap is good enough. Yay


100% correct, 80% correct lol.

The thing is that truth/reality is not a thing that is resolvable. Not even the scientific method has this sort of expectation!

You can imagine getting close to those percentages, with regards to consensus opinion. That's just a question of educating people to respond appropriately.


No. Whether a person should eat a certain number of small rocks each day is not a matter of opinion, it's not a deep philosophical problem and it's not a question whose truth is not resolveable. You should not be eating rocks.


You choose such an edge case question - how about this sort of thing:

Which is the best political party?

Are the side effects to X medical treatment?

I bet there are even cases when eating rocks is ok!

PS

It has been written about:

https://www.atharjaber.com/works/writings/the-art-of-eating-...

> Lithophagia is a subset of geophagia and is a habit of eating pebbles or rocks. In the setting of famine and poverty, consuming earth matter may serve as an appetite suppressant or filler. Geophagia has also been recorded in patients with anorexia nervosa. However, this behavior is usually associated with pregnancy and iron deficiency. It is also linked to mental health conditions, including obsessive-compulsive disorder.

Would you deny a starving person information on an appetite suppressant?

Also here:

https://www.remineralize.org/2017/05/craving-minerals-eating...

> Aside from the capuchin monkeys, other animals have also been observed to demonstrate geophagy (“soil-eating”), including but not limited to: rodents, birds, elephants, pacas and other species of primates.[1]

> Researchers found that the majority of geophagy cases involve the ingestion of clay-based soil, suggesting that the binding properties of clay help absorb toxins.

^^ The point being that even your edge case example is not unambiguously correct.


Are you really going to start eating rocks just to convince yourself that Google's AI isn't shit and objective truth is not real?


Lol! No, of course not.

My point is that I object to the idea that a result can be 100% right! Even in the case of eating rocks, it seems there are times that it can be beneficial.

To think '100% correct' is achievable is to misunderstand the nature of reality.


Really? No more salt for you then. Good luck with dehydration, hyponatremia, cramps, and cardiovascular issues.


You don’t need to be super ai complete - GPT4 is perfectly willing and able to tell you not to eat rocks and not to mix wood glue into pizza sauce. This is a fuckup caused by not dogfooding, and by focusing on alignment for political correctness at the expense of all else. And also by wasting a ton of engineering effort on unnecessary bullshit and spreading it too thin.


It's debatable whether Google has truly lost the plot because of the "AI wars", but the moment the statement "Bing returns more sensible results than you" becomes verifiably true, it's... cause for concern?

The approach that Google appears to have taken, which is to assume that the top-ranked part of its current search index is a sensible knowledge base, may have been true some years ago, but definitely isn't now: for whatever reasons, it's now 33% spam, 33% clickbait/propaganda, with the rest being equally divided between what could be called "truths" and miscellaneous detritus.

To me, it seems that returning to the concept that search results should at least reflect a broad consensus of what is true is a necessary first step for Google. As part of that, learning to flag obvious trolling, clickbait and bad-faith content is paramount. And then, maybe then, they can start touting their LLM benefits. But until the realities of the Internet are taken into account (i.e.: it's 80% spam!), any "we offer automated answers!" play is doomed.


Not only is the current internet 80% spam, it's rapidly approaching 99% thanks in large part to LLMs. At this point I would be shocked if Google had a solid plan for how to handle this going forward as the problem space gets more difficult.


that's the part that scares me. I railed on someone's comment the other day about "indexes will come back into fashion" but the more I think about how much garbage has increased in just the past 2 to 3 years, I think I was wrong. Indexes and forums may be the only way to have a sane net where you can find things. Perhaps communities linking together in a ring like format, a "web ring" of sorts.


What I've been wanting to see for a while now is a social-network based search engine:

* No pages are indexed automatically. The only indexed pages are pages that users say are worth indexing. Probably have a browser add-on for a one button click that people can use. * You can friend/follow others * Your search results are a combination of your own indexed pages and the pages indexed by people in your network.


Isn’t that what Reddit is or digg was ? Link aggregators ?

Gaming that is solved problem , you can use human bot farms to brigade and astroturf and you can even motivate people to do it for free .

If cost of spamming is cheaper than cost of moderation, spam will win


Not quite like reddit and digg. You can bot farm those because the lists are common to all.

In this search engine, let's say there's you, me, person three, and spammer. You you are following me, and I'm following person three. Spammer isn't in any of our networks.

When you use the search engine, you only see results that you, me, or person three manually tagged as worthwhile. Any pages or content that Spammer tagged as worthwhile aren't part of your results, because they aren't in your network. So they can try to game the system all they want, but it won't affect you.

If person three starts following Spammer, I can unfollow them and then Spammer's results will no longer be included in your search results (or you can unfollow me and avoid those results).

I imagine rankings would also be affected by degrees of separation, so even if you followed me, I followed person three, and person three followed spammer, results tagged by me and you would take much higher precedence than results tagged by Spammer.

This also allows you to make custom searches by choosing which people you follow to include. Suppose you want to search for good headphones, so you make a search, but only include people in your network that you know are music and audio savvy, so that the results reflect the pages tagged by those people.


What happens when I need to search a term that's outside the scope of topics that my followees are trusted for? Then it's a spammer free-for-all?


Spammers will find a way to beat it.


Good indices lead to good search engines (engines can make use of indices) Good search engines lead to bad indices (by obsoleting them) Bad indices lead to bad search engines Bad search engines lead to good indices


That sounds kind of like what Groupsy does. It creates a spider web of ideas.

https://groupsy.applicationfitness.com/post/healthymeals/664...



I do see incredibly weird kids content on YouTube sometimes (most likely bot generated?) which makes me think kids have been experiencing a worse internet before the rest of us have.


Kids are far less knowledgeable about how modern software works because they don't know of an Internet that didn't have algorithmic recommendations. They have to be taught to do things like click "Not Interested/Don't recommend channel" to improve their feed. Dark pattern designs make this harder by hiding these options behind tiny 3-dot buttons.


I don’t think this is a real problem because as users start being more intentional about who they subscribe to and more thorough in ranking content according to its usefulness and quality, the low quality stuff or regurgitated stuff will just vanish.

Why would it matter if there are clones of the best, say, blog post on how to make spicy ramen? If they are not adding anything new or making that original effort better, then they will not surface in searches as search tools improve. Nobody will save that content or recirculate it or refer to it when they need to remember how to make spicy ramen.

And people will build curated subscriptions and followings and recommendations that are more tailored to the individual, and we will spend more time determining who is trustworthy and who is not.


Oh it gets even better. The public has been hearing about AI this and AI that for over a year, but the existing use cases and deployment was confined to some super special niches like writing or the creative industries and programming.

This is the first nation-scale deployment of the technology, running on Google's biggest and most profitable market in one of the most widely used internet services, and it's a shitshow.

They can try manually fine tuning it, but all of the investors who have been throwing money at AI for the past year are now learning what this tech is like in the day-to-day, beyond just speculations, and it's looking... bad.


It's especially embarrassing for Google considering they have indexed virtually all of the world's information for the last 25 years.


Yeah, the most likely take here is that Google's leadership truly did not recognize how utterly awful the quality of their flagship search index had become over the years.

I mean, it explains a lot, but still... you're recruited using industry-leading practices out of an overflowing pool of abundant talent... and this is what you make of it? As the kids say: SMH!


> you're recruited using industry-leading practices out of an overflowing pool of abundant talent

The ridiculous focus on on leet-code is surely industry-leading (because whatever Google does becomes industry-leading) but it sure isn't a good way to filter for competency.


I heard a funny quote that "today we have a new generation of developers who learnt how to pass interviews but don't know how code, and we have an old generation of developers who know how to code but forgot how to pass interviews. Or maybe never knew".


> you're recruited using industry-leading practices out of an overflowing pool of abundant talent... and this is what you make of it?

That's exactly what to make of their frathouse nonsense.

Google has gotten away with it because smart people and a sweet moment of opportunity 20-25 years ago gave them... uh, an inheritance. They can coast on that inherited monopoly position, and afford to pay 100 people to do the work of 1, use the company's position to push whatever they build onto the market, and then probably cancel it anyway, always going back to the inherited money machine from the ancestors.

And then a lot of companies who didn't understand software development blindly tried to copy whatever the richest company they saw was doing, not understanding the real difference between the companies. While VC growth investment schemes let some of those companies get away with that, because they didn't have to be profitable, viable, responsible, nor legal, nor even have reasonably maintainable software.

Poor Zoomers are now a generation separated from before the tech industry's cocaine bender. For whatever software jobs will be available to them, and with the density of nonsense "knowledge" that will be in the air, I don't know how they'll all learn non-dysfunctional practices.


Plenty of people have been using ChatGPT for daily tasks for almost two years now. GPT-4 isn’t perfect but is otherwise really really good, and deftly handling use cases in my industry that would be impossible without it or however many billion dollars it would take to make GPT-4.

From the black Nazis to the suggestion to jump off the Golden Gate Bridge b/c depression, it’s pretty clear that this fiasco isn’t an LLM problem, it’s a Google problem.


Because no one cares when ChatGPT gets things wrong.


> To me, it seems that returning to the concept that search results should at least reflect a broad consensus of what is true is a necessary first step for Google. As part of that, learning to flag obvious trolling, clickbait and bad-faith content is paramount.

Who will decide what is obvious trolling and bad-faith content and how will they decide it? The problem they have is that search is only useful if it gives users what they are looking for. Their business model though is predicated on finding a way to introduce ads into the mix, and if they are also then trying to become arbiters of what truth people find and see, then all the conflicting goals will create a series of contradictory requirements. The search tools that usefully find what the user is looking for, with helpful suggestions, will win. Once users find that their experience is curated and that they are coerced by unelected arbiters and censors they will not trust the platform in question and someone else will get that market share.


the future is in-context search - basically not even going to google search to find something, but straight up doing that from your current window from any location. Basically a chat bot following you everywhere.


One that you can turn down, but not off.


My whole qualm with this AI integration into search engines: it's a search engine, not a question engine. I go to google to search the internet for something, not ask it a question. IMO, asking AI for something is a different task than searching the internet.

It's sorta the same problem as if I go into a store and ask an employee where something is, and they reply with "well what are you trying to do?"


>it's a search engine, not a question engine.

for a lot of people and in a lot of use cases, it is a tool for answering questions. it generally works well for that.

i get that the AI implementation sucks, but to suggest that people don't use google to find the answer to questions is absurd. that's absolutely what it's for.


Your interpretation is a bit strict, with little charity, its clear the poster means "i don't always just want an answer, i want to learn"

I saw this over and over again working at products at G, someone would invoke some myth I can't quite remember about "Larry" had a vision of just giving the answer

That's true but comes back to the central mistake Google makes: we don't actually have AGI, they can't actually answer questions, and people aren't actually satisfied with just the answer.

There's all sorts of tendrils from there, ex. a major sin here _has_ to be they're using a very crappy very cheap LLM.

But, I saw it over and over again, 7 years at Google, on every AI project I worked on or was adjacent to, except one. They all assume $LATEST_STACK can just give the perfect answer and users will be so happy. It can't, they don't actually want just the answer, and BigCo culture means you don't rock the boat and just keep moving forward.


the thing with search is that a human has to use reasoning on the result, while with AI the expectation

Thus when a human sees a suggestion to use glue on pizza, it would question the result. While AI can't.


Recently I searched Google for a slightly unlikely phrase — in quotation marks — and Google proudly told me that my phrase was grammatically correct.

And nothing else. They didn't give me any search results. Or even tell me there weren't any results. Or even give me a button to press to say "no, I really wanted to search the internet for this phrase".

And also I have zero interest in Google's opinion on English grammar and am frankly insulted to be offered it, although to be fair I'm probably in a minority worldwide on that one.

If I can't use Google to search the internet for things, then Google is eventually going to have a big problem.


I sometimes wants a search engine, sometimes a question engine. Likewise at the store.

Why not have both with a way to choose which one I want on the moment?


> I sometimes wants a search engine, sometimes a question engine.

If you want a search engine, it's easy to use the results as a feedback to refine the query. But a question (answer?) engine would need to be an expert in the subject. And not parroting stuff. That usually means curation. You need something to do the work ahead to filter the wheat from the shaft. I don't see how LLMs can do that.

LLMs can't be a search engine, and can't be an question engine. The best way to treat it is a simulation engine, but the use cases depend on the training data. But the proof is there that the internet is full of junk, and not that expansive.


> I don't see how LLMs can do that.

If it's in the training data, then it should be able to do that. That is to say, a comment's points matter. and the subreddit it's on. and who said it, and how the rest of their comments do/where they are. The LLM could annotate the unredacted reddit dataset with metadata as to where to rate it on the words used, the accuracy of the information, the sarcasm quotient, the hilarity quotient, how condescending the comment is; all of that an LLM could generate metadata about and feed into itself to get better and better.


Right, but people use search engines to search for answers, don't they? Aren't answers something?

I ask Google questions so that it gives me websites where there are answers to those questions.

What foods are high in vitamin C? How long is a flight from New York to LA? How large is the moon? What are the symptoms of COVID-19?


Like the overly helpful person at the local hardware store.


What hardware store have you gone to where this was an issue for you?


1. Google announces something that has AI bolted on

2. A VP pontificates about how much work they did to "get it right"

3. An easy-to-anticipate first-order issue surfaces

4. Sundar issues a statement like "this is completely unacceptable. We will be making structural changes to ensure this never happens again."[0]

5. GOTO 1

[0] https://m.economictimes.com/tech/technology/sundar-pichai-ca...


This is what happens when senior leadership no longer even attempt to hide their contempt for the rank-and-file.


At that point, this form of contempt usually referred to as narcism.


IMO, it's just raw greed. Avarice devoid of shame.


I'm actually shocked that a company that has spent 25 years on finetuning search results for any random question people ask in the searchbox does not have a good, clean, dataset to train an LLM on.

Maybe this is the time to get out the old Encyclopedia Britannica CD and use that for training input.


They spent 10 years finetuning the search and then another 15 finetuning ads and clicks. Google's business is ads, not search.


Apologies in advance for this level pedantry: Google’s business is behavioral futures, not ads. Ads are just a means to that particular end.


Google exchanges advertisement placement for money. Ads are their business by any normal definition of that term.


Google’s transformation of conventional methods into means of hypercapitalist surveillance is both pervasive and insidious. The “normal definition of that term” hides this.


You don't need "hypercapitalist surveillance" to show someone ads for a PS5 when they search for "buy PS5".

If they're doing surveillance they're not doing a good job of it, I make no effort to hide from them and approximately none of their ads are personalized to me. They are instead personalized to the search results instead of what they know from my history.

Meta is the one with highly personalized ads.


If Google doesn’t need surveillance, why do they surveil? Why then do they waste the time to track your browsing history, your location, and etc?

If simple keyword matching was enough, why would they spend literally billions a year on other tactics?


Why does Google launch and then cancel five messaging apps a year? As a monopoly, they mostly don't need good reasons to do anything.

It does help with their Doubleclick business - ads on websites other than Google. I don't find these too personalized either, but they do try.

And of course, many people actually like that Chrome saves their browsing history.


well, this can apply to any IPO-ed tech company, not only Google.


Surveillance capitalism? What are behavioral futures?


It’s a bit weird since Google is taking over the “burden of proof”-like liability. Up until now, once user clicked on a search result, they mentally judged the website’s credibility, not Google’s. Now every user will judge whether data coming from Google is reliable or not, which is a big risk to take on, in my opinion.


they went from "look at this dumbass on reddit" to "no it is I (Google) who is in fact the dumbass". It's an interesting strategy to say the least.


That latter point might be illuminating for a number of additional ideas. Specifically, should people have questioned Google's credibility from the start? Ie: these are the search results, vs this is what google chose.

Google did well in the old days for reasons. It beat alta vista and Yahoo by having better search results and a clean loading page. Since perhaps 08 (based on memory, that date might be off) or so, Google has dominated search, to the extent that it's not salient that search engines can be really questionable. Which is also to say, google dominated, people lost sight that searching and googling are different, that gives a lot of freedom for enshittification without people getting too upset or even quite realizing - it could be different and better


My point was not well stated. My hope is that if people learn to question AI summaries, they might also learn to question the rest of the search page.


I am also surprised that training data are not much more curated.

Encyclopedias, textbooks, reputable journals, newspapers and magazines make sense.

But to throw in social media? Reddit? Seems insane.


Even some results from "The Onion" seem to be in it. Looks like Google just took every website they've ever crawled as source.


The problem is that for some searches and answers Reddit or other social media is fine.


But only if you do a lot of filtering when going through responses. It’s kind of simple to do as a human, we see a ridiculous joke answer or obvious astroturfing and move on, but Reddit is like >99% noise, with people upvoting obviously wrong answer because it’s funny, lots of bot content, constant astroturfing attempts.


The users of r/montreal are so sick of lazy tourists constantly asking the same dumb "what's the best XYZ" questions without doing a basic search fit, the meme answer is always "bain colonial" which is a men-only spa for cruising. Often the topmost voted comment. I just tried asking gemini and chatgpt what that response meant and neither caught on..


No, it isn't. Humans interacting with human-generated text is generally fine. You cannot unleash a machine on the mountains of text stored on reddit and magically expect it to tell fact from fiction or sarcasm from bad intent.


> You cannot unleash a machine on the mountains of text stored on reddit and magically expect it to tell fact from fiction or sarcasm from bad intent

I didn't say you could, but that a machine can't decode the mountains of text doesn't mean that the answer isn't (perhaps only) on Reddit. I don't think people would be that interested in search engine that just serves content from books and academic papers.


The fact is that I think that there is not much written word, to actually train a sensible model on. A lot of books don't have OCRed scans, or a digital version. Humans can extrapolate knowledge from a relatively succinct book and some guidance. But I don't know how a model can add the common sense part (that we already have) that books relies on to transmit knowledge and ideas.


> The fact is that I think that there is not much written word, to actually train a sensible model on. A lot of books don't have OCRed scans, or a digital version.

https://books.google.com/


Google doesn't look like they're fine tuning anything other than revenue


You may find this illuminating. The google prior to 2019 isn’t the google of today.

https://www.wheresyoured.at/the-men-who-killed-google/

Edit: there was also a discussion on HN about that article.


Coincidentally, I was just watching a video about how South Africa has gone downhill - and that slide was hastened by McKinsey advising the crooked "Gupta brothers" on how to most efficiently rip off the country.


The problem in this case is not that it was trained on bad data. The AI summaries are just that - summaries - and there are bad results that it faithfully summarizes.

This is an attempt to reduce hallucinations coming full circle. A simple summarization model was meant to reduce hallucination risk, but now it's not discerning enough to exclude untruthful results from the summary.


I don't think it's true at all.

Two reasons. The first, even ignoring that truth isn't necessarily widely agreed (is Donald Trump a raping fraud?), is that truth changes over time. eg is Donald Trump president? And presidents are the easiest case because we all know a fixed point in time when that is recalculated.

Second, Google's entire business model is built around spending nothing on content. Building clean pristinely labeled training sets is an extremely expensive thing to do at scale. Google has been in the business of stealing other people's data. Just one small example: if you produced (very expensive at scale) clean, multiple views, well lit photographs of your products for sale they would take those photos and show them on links to other people's stores; and if you didn't like that, they would kick you out of their shopping search. etc etc. Paying to produce content upends their business model. See eg the 5-10% profit margin well run news orgs have vs the 25% tech profit margin Google has even after all the money blown on moonshots.


So Google hasn't used an LLM to generate and test weird queries ? This is not putting the bar very high for the whole industry... There'd be so much to gain from a clean deployment... Either it hard, either it is a rush. As a machine learnist, I believe it's actually impossible, by design of the autoregressive LLM. This race may we'll be partially to the bottom.


Google’s poor testing is hardly in doubt. But keep in mind that the whole problem is that LLMs don’t handle “unlikely” text nearly as well as “likely” text. So the near-infinite space of goofy things to search on Google is basically like panning for gold in terms of AI errors (especially if they are using a cheap LLM).

And in particular LLMs are less likely to generate these goofy prompts because they wouldn’t be in the training data.


> So Google hasn't used an LLM to generate and test weird queries ?

You don't even need an LLM for that. Google will almost certainly have tested.

The test result is just politically-unacceptable within the company: It doesn't work, it's a architectural issue inherent to the technology, we can't fix it.

Instead, they just rush to patch any specific, individual errors that show up, and claim that these errors are "rare exceptions" or "never happened".

What's going on here is that Google (and most other AI firms) are just trying to gaslight the world about how error-prone AI is, because they're in too deep and can't accept the reality themselves.


Deploy the cheap offshore labor!


They already know it’s a shit show. They are trying to push it along until it’s someone else’s fault.


I'm not convinced the executive layer is aware how dire the problem is.

On one hand, their support for outsourcing programmes; "Training Indians on how to use AI", suggests they realize AI tooling without human cleanup is a crapshoot.

On the other hand, they keep digging. This kind of gaslighting is an old and proven trick for genuinely rare problems, but it doesn't work if your issues are fairly common, as they'll get replicated before you can get a fix out.

Similarly, they're gambling with immense legal risks and sacrificing core products for it. They're betting the farm on AI, it may kill the company.


I think they are more than aware but will magically disappear after cashing their stock just about the point the bubble pops. Don't forget that the AI industry is almost 100% based on hype. Microsoft will be the largest victim here, their entire product portfolio being turned into a nuclear fallout zone almost overnight. Satya and friends are going to trash the whole org.

I regularly speak to laypeople who assume that it's some magical thing without limits that makes their lives better. They are also 100% unaware of any applications that will actually make their lives better. End game occurs when those two disconnected thoughts connect and they become disinterested. The power users and engineers who were on it a year ago are either burned out or finding the limitations a problem as well now. There is only magical thinking, lies and hope left.

Granted there are some viable applications but they are rather less overstated than anything we have no and there are even negative side effects of those (think image classification, which even if it works properly, requires human review and there are psychological and competence things problems around that too).


Google is working hard to be the next Boeing.


> So Google hasn't used an LLM to generate and test weird queries ?

What about simple manual testing? Seems to have skipped QA completely, automated or not.


There has been a lot of excitement recently about how using lower precision floats only slightly degrades LLM performance. I am wondering if Google took those results at face value to offer a low-cost mass-use transformer LLM, but didn’t test it since according to the benchmarks (lol) the lower precision shouldn’t matter very much.

But there is a more general problem: Big Tech is high on their own supply when it comes to LLMs, and AI generally. Microsoft and Google didn’t fact-check their AI even in high-profile public demos; that strongly suggests they sincerely believed it could answer “simple” factual questions with high reliability. Another example: I don’t think Sundar Pichai was lying when he said Gemini taught itself Sanskrit, I think he was given bad info and didn’t question it because motivated reasoning gives him no incentive to be skeptical.


Well yeah imagine how much money there is to make in information when you can cut literally everyone else involved out, take all of the information and sell it with ads and only give people a link at the bottom, if that is even needed at all


The adversarial surface to the LLM remains enormous, manual cannot handle it.


Asking how to prevent cheese from sliding off pizza is not an adversarial prompt.


They still haven’t learned from the Gemini diverse Nazis debacle.


It’s pre-alpha trash that’s worse than traditional search in every meaningful way.

Kudos to the artist at the Verge for the accompanying image — those are fingers AI would be proud of.


Why do people act like LLMs only hallucinate some of the time?


It’s not hallucinations here, multiple of the ridiculous results can be directly traced to redit posts where people are joking or saying absurd things


There are examples of hallucinations as well e.g. talking about a Google AI dataset that doesn't exist and using a CSAM dataset which it doesn't.

One of the researchers from Google Deepmind specifically said it was hallucinating.


So...every Reddit post?


Not hallucinations but these AI answers often (always?) provide sources they link to. It's just that the source is a random Reddit or Quora post that's obviously just trolling.

Then, when people post these weird AI answers on Reddit and come up with more absurd jokes, the AI then picks it up again. For example in https://www.reddit.com/r/comedyheaven/comments/1cq4ieb/food_... Google AI suggested applum and bananum as a response to food names ending with "um" when someone suggested uranium, Copilot AI started copied that suggestion. It's entertaining to watch.


The best trick the A.I. companies have pulled is getting us to refer to ‘bugs’ as ‘hallucinations.’ It sounds so much more sophisticated.


Ah, my friend

it's not a bug

It's a fundamental feature

These LLMs can produce nothing else but since the bullshit they spew resembles an answer and sometimes accidentally collide with one, people tend to think it can give answers. But no.

https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.


This is irrelevant because the LLM is mostly not answering the question directly, it's summarizing text from web results. Quoting a joke isn't a hallucination.


That’s a good take.

So LLMs distill human creativity as well as human knowledge, and it’s more useful when their creativity goes off the rails than when their knowledge does.


It’s not a trick to sound sophisticated. Hallucinations are more like a subcategory of bugs. The system is technically correctly generating, structuring, and presenting false information as fact.


Technically everything an LLM does is hallucination that happens to be on a scale between correct and non-correct. But only humans with knowledge can tell the difference, math alone can't. It's not even a bug: it's the defining feature of the technology!


> But only humans with knowledge can tell the difference

Who says the humans (all of them) aren't hallucinating too?


Knowledge isn't sufficient to show something is false, since the knowledge can also be false. Insofar as it's important for it to be true, it needs to be continually verified as true, so that it's grounded in the real world.


Hmm yeah I kinda like the concept that it's "hallucinating" 100% of the time, and it just so happens that x% of those hallucinations accurately describe the real world.


That x% is far higher than people think it is because there's a tremendous amount of information about the world that ai models need to "understand" that people just kind of take for granted and don't even think about. A couple of years ago, AI's routinely got "the basics" wrong, but now so often get most things right that people don't even think it's worth commenting on that they do.

In any case, human consciousness is also a hallucination.


It really depends on the set of prompts you present the LLM. If it's anything requiring reasoning, you'll often get nonsense that sounds like sense. It has a higher chance of being accurate with knowledge queries.

LLMs are impressive, a very lossy search engine in a small package, capable of outputting convincing natural language responses.


it's only AI if you believe it


Manually removing rogue AI results is kind of ironic isn't it?


Pay no attention to the army of people behind the curtain pulling levers trying to make it look like they’ve actually but a real AI.


For this tech cycle, AI is short for Actually Indians


You could almost argue these results are directly human generated.

edit: And in that case, who is the arbiter of truth?


generative AI is essentially three day labourers from an emerging economy in a trenchcoat. From data labelling, to "human reinforcement", to manually cleaning up nonsensical AI results.


Pay no attention to the Accenture contractors behind the curtain!


We need an AI for that


I am not surprised that AI results are bad. I know they are bad. But that doesn't concern me because I expect it to get better.

What concerns me is that Google would push this trash to the front page. What are they even thinking? Who gave go ahead on this?


Institutional investors panic > board panics > executives panic > evps panic and dictates incentives to ship AI > directors, ems, and below, who actually know how shit works, take a submissive role because they have mortgages in Mountain View to pay.

That’s how it happens.


Not sure if anyone below director can afford a mortgage in Mountain View.


I know that's hyperbole, but here's something for $600k. Let's say it goes for $700k. 20% of 700 is $140k.

An L4 engineer at Google makes like $300k. round down to $200k post tax. live in shared housing for $2000/month; that's $24k/yr, add $2000/month living expenses on top of that, is another $24k. Stick $2000/month into 401k, Save the rest (200-24-24-24 ) 128k and you'll save up $140k in a bit over a year.

bam; owner, not renter.

Now, whether this is better than renting, and putting the money into the stock market is totally other question, but even L4's at Google can afford a mortgage in Mountain View.

https://www.zillow.com/homedetails/50-E-Middlefield-Rd-APT-4...


I love chatgpt and use it all the time and find it tremendously useful, but I never want to see AI generated content when I am not specifically looking for it. I don't want to see it in comments, I don't want to see it in search results, I don't want to see it as an illustration for an article, I _really_ don't want to see AI generated word vomit blog posts or fake "news" articles when I'm looking for actual information.

It's not even because it's sometimes (or often) wrong or full of hallucinations. Even if it's 100% factually correct all of the time, it's _poor quality writing and art_, full of cliches and bland generalities, which even if they solve all the rest of the problems it's sort of fundamental to the architecture of transformers. You can't ever be truly creative or unique if you're predicting the _most likely_ token.


I’m curious why Sundar Pichai is still running this company? From recent videos it really seems like he has no idea what he’s talking about, and the company seems to be headed in the wrong direction.

Just checked the 5 year stock graph; now I understand


Google keep making these same large and embarrassing mistakes time and time again. I think it's because their devs don't eat enough rocks every day.


Is it the rocks or that the pizza that is served at Google doesn’t have enough glue?


Trained on Twitter and Reddit. Garbage in/Garbage out, as it has always been.


Except that 90% of Reddit isn't garbage. It's really useful.

Problem is Google can't tell what is garbage or not. No LLM can.


> Except that 90% of Reddit isn't garbage. It's really useful.

Citation needed. I've been a Reddit user since its inception and honestly except for niche hobby subreddits, Reddit is mostly low effort garbage, bots and rehashed content. I'd wager that mainstream subreddits are 99% garbage for training an LLM for anything other than shitposting.


Even in the niche hobby subreddits there can be a really high garbage factor. There's plenty of well meaning posters that are just wrong. They're not trying to mislead or lying they're just unaware they're wrong.


The good answers tend to use links as well, which won't capture well. In many political and local subreddits there's a huge amount of Russian and far right sock puppet activity. Good luck training an AI to understand political opinions or what people in an area are like when most of the longer comments are pre written copy pasted talking points from astro turf groups and bad actors.


Pretty much. There were some good information, and even book worthy ones. But they were the ones that bubble to the top in helpful and knowledgeable communities. The rest is junk.


I'd argue it's far less than 90% but yes, there is some good information there. But weeding out the noise is what needs to happen, and (for some topics more than others) there is an awful lot of it.


This is analogous to the Apple Maps launch failure.

Except that Apple competes to make the best smartphone, and an iPhone was still valuable without Apple Maps.

What happens to Google if it stops being able to compete in search?


Interesting thing about that is that Bing Maps was worse at the time, has never gotten better, and nobody noticed because nobody cares about it.


"Search" isn't Google's product. Google hasn't been a search company for 20 years.

"Ads" is Google's product. And the only way they'll go bankrupt is if 1) companies realize that advertising is pointless (I'm not holding my breath), or 2) some other company takes over from Google, which seems unlikely without government intervention (I'm not holding my breath).

Google is a shit company, but they'll still be around 20 years from now, because our economy is nonsensical and irrational.


Still need visitors to see the ads.


Google runs ads for a significant percentage of the web (or the markets for ads). Even if everyone stopped going to google.com tomorrow they'd still be seeing ads that make Google money. Google the company would still be tracking much of the web's traffic feeding it into their ads platform.


I think it's a good use for AI. AI's making ads that AI's watch to enrich Google execs. Who needs people?


How hard can it possibly be to just turn off the entire AI-generated overview functionality given that it just got introduced...


very hard indeed, if you're optimizing for favourable opinions from Wall St analysts come earnings time .


It seems to be turned off for me. And I was in beta testing for a month. Or maybe they are figuring out who is doing weird searches and turning off for them.

In any case this thing is just hilarious. Just right after their AI painted historical figures as black.


So far I have not seen it ever in either Firefox or 2-3 Chromium-based browsers, on a handful of computers in multiple locations.

I don't see a way google can make this work. As I understand it LLM confabulations can be reduced but never eliminated owing to how they're built. Google could try and create a fact-checking department to make queries reduced to falsehoods or bullshit but then they face the problem of appointing themselves arbiters of the "truth". The only way to win is to not play the game, as I see it. I wish the collective AI fever would break already.


Who knows how many of these are fake. People have been dropping inspect-element-manipulated screenshots all over twitter.

https://www.nytimes.com/2024/05/24/technology/google-ai-over...

> A correction was made on May 24, 2024: An earlier version of this article referred incorrectly to a Google result from the company’s new artificial-intelligence tool AI Overview. A social media commenter claimed that a result for a search on depression suggested jumping off the Golden Gate Bridge as a remedy. That result was faked, a Google spokeswoman said, and never appeared in real results.

that screenshot was tweeted by @allgarbled. ten minutes before, they tweeted:

>free engagement hack right now is to just inspect element on the google search AI thing and edit it to something dumb. hurry up, this deal won’t last forever


I have personally reproduced several like the interest one and the hippo eggs one, though not that one specifically.

Google has started restricting AI Overviews so much now that most of the example queries on Google's Search Labs page doesn't even trigger it anymore.


I'd say the broader issue here is a lack of transparency into results.

If Google is sending bad results, who can prove that?


That’s always been an issue. Years ago, researchers demonstrated in an experiment that they could swing public opinion about electoral candidates by manipulating search results. Who knows if Google took that experiment and ran with it?


I mean, that's always been the TikTok argument, to me.

Widely-used platforms that can +/- 1% their algorithms to affect democracy have pretty high burdens of trust/transparency, and we're not close to that with any platform (Chinese or not) that I'm aware of.

Meta's probably the closest, because of scrutiny, but afaik even their transparency isn't sufficient for realtime attestation.


Feels like there's a market for a bunch of Googlers to go off and take what they know about how Google works and make a new, barebones search engine that is essentially Google circa 2015.

Before AI, before we had to append "reddit" to get useful human knowledge.


They will fix this by lobbying to make saying false things on the internet illegal.


The fall of Google’s reputation on ML is nothing short of spectacular. They went from having a near untouchable reputation as being far ahead of any other large tech company on ML to total shambles in a year. Everything they’ve released has been a complete popcorn worthy dumpster fire from faked demos, to racist models that try and pretend white people don’t exist, to this latest nonsense telling me put glue on my pizza.

What the heck happened? Or was their reputation always just more hype than substance?


It could be because they actually released something. If you look back, the Google Research blog posts always have grandiose claims, but you can often never use them.


AlphaGo, AlphaFold, and Waymo FSD are all released in the sense that you can see them actually working in the real world. Those all took much longer to put together than whatever rushed features were released to catch up with OpenAI, however.


They are also extremely constrained problem spaces relative to the problem space of LLMs, which is apparently "everything imaginable".


Waymo is not Google. And Deepmind operated quite independently until not long ago.


research != product


It's not really that bad. I use gemini often and it's great. I prefer their UI


What do you like more about their ui?


faster, it has options like 'modify'. I also feel it follows my commands better, esp. when i ask to rephrase


There was an interesting interview with David Luan about this recently. For context, he was a co-lead at Google Brain, early hire at OpenAI, and is now a founder at Adept: https://www.latent.space/p/adept

The TL;DR on his take is that there are organizational and cultural issues that prevent Google from focusing their research efforts in the way that is necessary for what he calls "big swings," like training GPT-3.

In regards to your second question, Google's reputation in ML is definitely not hype. Purely on the research side, Google has been behind some of the most important papers in modern ML, particularly around language model. The original Transformers paper, BERT, lots of work around neural machine translation, all of the work that DeepMind has done post-acquisition, and the list goes on. On the applied side, they also have some of the most successful/widely-adopted ML-powered products on the market (think RankBrain/anything involving a recommendation engine, Translate, Maps, a ton of functionality in Gmail, etc).


At least Elmer's white glue is edible, millions of kids agree.

(The logic sort of makes sense. Glue sticks things together, and some glue is edible.)


research != product


It's very funny that Bing AI is now also telling people to eat a small rock every day, and citing pages telling people about how dumb Google AI is for telling people to eat rocks.


Most of the search results fixes are manual and are in response to publicity. You can typically find analagous problems for weeks/ quarters after things like this.


They should start with just removing reddit from the data set.


Now the AI changed from an immature teenager to a blogspam article


The AI is often quoted without context. It actually answered, 'somebody has suggested adding glue...',which is different from 'add glue...'.


Perhaps they could run each search result through ChatGPT. It's pretty skilled at spotting bad results. For example, I asked it whether the glue-on-pizza result was "valuable and should be shown to a user" and it returned "No, this response should not be shown to the user. The suggestion to add non-toxic glue to the sauce is inappropriate and potentially harmful."


Just focus on making useful software to improve people’s lives. Holy fuck the last five years feel like such a waste.


Throwing good money after bad.

Companies spent all that money on high end GPUs for crypto mining and that went bust, now gotta figure out something to do with the hardware to try to recoup some of the investment. Google pumped $1.5 Billion into crypto.


Google has TPUs.


Management realized that they are not good enough to sustain progress, so they humbly allocate resources to the next generation: AI


But AI will solve everything!


...for our shareholders


I hope AI will bring back the "Sort by date" button on Google Reviews, and add somewhere a Google Maps link.

Who knows, maybe AI can bring back exact keyword matches, or correct basic math calculations on Google Search too.


It will cost $2 billion of nvidia chips and it won't work.


Maybe AI could bring back the pre gen AI tech scene


I had to do a double take as I thought it was about Weird Al Yankovich for a second


Why? I wouldn't mind using a search engine where Weird Al answers my queries.


I read that as Weird Al as well, and was very much confused.


That's it. Google's cancel culture is getting out of control!!!! /s


“manually remove weird AI answers” is an oxymoron. Sort of like saying “deployed manual drivers to improve self driving performance”


As I mentioned previously, I've seen Bing's LLM stall for about a minute when asked something iffy but uncommon. I wonder if Bing is outsourcing questionable LLM results to humans. Anyone else seeing this?


It could be that, but it also could be a cascade of non-LLM checks and retries to GPT with additional prompting.


I'm waiting for some clever hacker to come up some sort of logic bomb that causes the learning sets to become worthless.

Something innocuous to a non ai scientist human but is otherwise fatal to the LLM data sets.


It's just text. You cant make some text that's magically dangerous.


"text" made the LLMs report offensive and give unfiltered replies to inqiiries. To think what I said above can't happen during the web scraping process is naive. Thanks for the down d00t.


I switched to MS Edge on my Android a couple of years ago and at that time I didn't notice that the default search engine was Bing... Now I do.


> Google scrambles to manually remove weird AI answers in search

Maybe they shall check what AI is, in the first place. They didn't seem to get the basics.


Gary Marcus, an AI expert and an emeritus professor of neural science at New York University, thinks the 80/20 rule (or 90/90 rule) is true.


I have already eaten rocks and glue, the AI had won.


Am I right in thinking this means they are using RAG to inject this rather than directly training the model on the source?


Hey Google. Here's a really stupid idea.

Knock it off.

Your core search result product has gotten increasingly worse and less reliable over at least the last 5 years. YouTube's search results are nearly unusable.

I can't imagine almost any external customer is asking for the AI bullshit thing that's just being shovelwared into everything Alphabet product now.

I just noticed a couple days ago the gmail iOS app now does the same predictive completion that Copilot tries to do when I'm working. It's annoying as hell and I can't find how or if I can turn it off.

Stop bullshitting around with ruining your products and get back to making money by making accessing information easier and more accurate.


Google: Hey geuis, our revenue is record, our stock value is record, our metrics are all at record. The execs making decisions have just paid of millions in stock [1] making them staggeringly rich no matter what happens in the future. We can't hear your over the sound of green bills going BRRRRR.

[1]: https://www.businessinsider.com/alphabet-google-executive-pa...


Most accurate description of Google I have seen. YT search is so, so bad. Three relevant results followed by twelve "people also watched" results then back to the good results.


Although ChatGPT is a great product, I rely on it more and more not because it's improving, but because Google results are getting worse.

Yeah I would still fact check for complex, indepth things...but for quick things where I'm knowledgeable enough I can smell the hallucinations from a mile away, ChatGPT 100%.


I don’t understand why these companies done just talk to people like they’re actually people and tell people when they’re rolling out new stuff.

Straight up, and this going to sound really stupid I know, but if Sundar Pichai had just come out and said, “hey, we’re trying to do this new thing, it’s gonna be hard but we want to make Google awesome if you like the results click ‘I like it’ button, otherwise click the ‘dislike’ button so we can get some real feedback from people, but seriously, this stuff is hard, so please help us out.

If we can tune the AI so that we can give you the best results possible it will make Google way better! Also, if you don’t wanna see any of this, there’s a setting here to turn it off.”

Just ask people, show a little humanity, and act human and you’ll get better results and won’t be getting all this pad press. The same thing for openAI right now too.

Seriously, though, does anybody else crave authenticity? These companies are all acting like the AI that they’re trying to create, but from five years ago when it was shitty and didn’t know how to communicate with people. Just talk to people and ask for their help. Just being honest and talking to people normally isn’t that hard.


the problem google have is that the ai answers are based on results and results got really bad a few years ago.

I got a couple of answers that are based on SEO spam produced by an ecommerce with a lot of reputation and of course the answers don't make any sense


It's a whackamole game. They should have never bothered with AI in the first place


Your usual reminder that there was a guy at Google who was so impressed by their LLM that he considered it sentient. And this was two years ago when the AI was presumably far less developed than the current abonination.

https://www.theguardian.com/technology/2022/jun/12/google-en...


> And this was two years ago when the AI was presumably far less developed than the current abonination.

It's gotten worse since then because the development effort has been on making it faster and cheaper.

If you use Gemini it's quite good, especially the paid one.


Google hooked Joe up to the tank and is just now realizing what they'd done and scrambling to contain the damage.

With the Department of Justice breathing down their necks it's a doubly bad look for them. I'm not crying any tears for them though.


Google had the best search engine there is.

Then they enshittified it for short term profit and now they panic instead of reverting course and simply laugh at AI companies.

Madness.


How are any investors cool with the ROI of AI in 2024?


Putting glue on the pizza is (apparently) a clever way to take pictures of slices of pizza that look "perfect" to the camera (not for eating, obviously) [1]. I remember a couple years ago some videos of "tricks" showing this, plus literally screwing the pizza with screws.

So, yeah, the ai did in fact autocompleted the question correctly. It was just the wrong context. Good luck trying to "fix" that.

[1] https://shotkit.com/food-photography-secrets-revealed/ (number 2)


"correctly but wrong" is just wrong.... there are no points scored for "in a very specific context it would've made sense"


This is the kind of ridiculous fumble that GOFAI (like Cyc) should be able to avoid by recognizing context. I wonder how neuro-symbolic systems are coming along, and whether they can save us from this madness. The general populace wants the kinds of things LLMs provide, but isn’t prepared to be as skeptical as is needed when reviewing the answers it generates.


My initial thought was to simply have any match with an Onion story blacklisted... But then I realized that The Onion became prophetic in 2016 when Trump ran for president.

Since then the only difference between an Onion fiction and things actually sucking that much is a decade or less in almost all cases.

If we blacklisted content seen in the Onion, we'd automatically wipe out most news.


What's wrong with Weird Al's answers?


Maybe they could create a function that identifies satire. Which seems obvious after about five seconds of consideration.


The cat is out of the bag. Keep eating rocks and sticking down your pizza toppings.


With these dangerous answers, to the general public, Google is giving AI a very bad name, when in truth it's strictly Google that deserves the feeling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: