Hacker News new | past | comments | ask | show | jobs | submit | best comments login

(I work at OpenAI.)

It's really how it works.


> We’re moving toward a world where every job will be modeled

After an OpenAI launch, I think it's important to take one's feelings about the future impact of the technology with a HUGE grain of salt. OpenAI are masters of hype. They have been generating hype for years now, yet the real-world impacts remain modest so far.

Do you remember when they teased GPT-2 as "too dangerous" for public access? I do. Yet we now have Llama 3 in the wild, which even at the smaller 8B size is about as powerful as the [edit: 6/13/23] GPT-4 release.

As someone pointed out elsewhere in the comments, a logistic curve looks exponential in the beginning, before it approaches saturation. Yet, logistic curves are more common, especially in ML. I think it's interesting that GPT-4o doesn't show much of an improvement in "reasoning" strength.


The amount of negativity in these comments is astounding. Congrats to the teams at Google on what they have built, and hoping for more competition and progress in this space.

We've had voice input and voice output with computers for a long time, but it's never felt like spoken conversation. At best it's a series of separate voice notes. It feels more like texting than talking.

These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.


More than anything else, I think the modern American super market would blow the minds of anyone born before 1900 more than any other marvel that exists.

You have blueberries for sale in January??? A variety box of tea from 7 different countries? A wall of spices? Pineapples? Packaging made from aluminum that is just thrown away? The bread isn't full of sand and grit? And it's sliced!!!

All relatively affordable and accessible to the average person.


The most impressive part is that the voice uses the right feelings and tonal language during the presentation. I'm not sure how much of that was that they had tested this over and over, but it is really hard to get that right so if they didn't fake it in some way I'd say that is revolutionary.

The usual critics will quickly point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on -- blah, blah, blah, stochastic parrots, blah, blah, blah.

Ignore the critics. Watch the demos. Play with it.

This stuff feels magical. Magical. It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison. I'm impressed at how well this works.

Well-deserved congratulations to everyone at OpenAI!


> This stuff feels magical. Magical.

Because its capacities are focused on exactly the right place to feel magical. Which isn’t to say that there isn’t real utility, but language (written, and even moreso spoken) has an enormous emotional resonance for humans, so this is laser-targeted in an area where every advance is going to “feel magical” whether or not it moves the needle much on practical utility; it’s not unlike the effect of TV news making you feel informed, even though time spent watching it negatively correlates with understanding of current events.


> (I work at OpenAI.)

Winner of the 'understatement of the week' award (and it's only Monday).

Also top contender in the 'technically correct' category.


Interesting, both Karpathy and Sutskever are gone from OpenAI now. Looks like it is now the Sam Altman and Greg Brockman show.

I have to admit, of the four, Karpathy and Sutskever were the two I was most impressed with. I hope he goes on to do something great.


The license is not good: https://falconllm-staging.tii.ae/falcon-2-terms-and-conditio...

It's a modified Apache 2 license with extra clauses that include a requirement to abide by their acceptable use policy, hosted here: https://falconllm-staging.tii.ae/falcon-2-acceptable-use-pol...

But... that modified Apache 2 license says the following:

"The Acceptable Use Policy may be updated from time to time. You should monitor the web address at which the Acceptable Use Policy is hosted to ensure that your use of the Work or any Derivative Work complies with the updated Acceptable Use Policy."

So no matter what you think of their current AUP they reserve the right to update it to anything they like in the future, and you'll have to abide by the new one!

Great example of why I don't like the trend of calling licenses like this "open source" when they aren't compatible with the OSI definition.


This is a very cool demo - if you dig deeper there’s a clip of them having a “blind” AI talk to another AI with live camera input to ask it to explain what it’s seeing. Then they, together, sing a song about what they’re looking at, alternating each line, and rhyming with one another. Given all of the isolated capabilities of AI, this isn’t particularly surprising, but seeing it all work together in real time is pretty incredible.

But it’s not scary. It’s… marvelous, cringey, uncomfortable, awe-inspiring. What’s scary is not what AI can currently do, but what we expect from it. Can it do math yet? Can it play chess? Can it write entire apps from scratch? Can it just do my entire job for me?

We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.


There have been 3 updates to the zones in the past 50 years. Some of the updates are due to better accuracy after years of collecting data, but the 800-pound gorilla in the room is climate change. Where I live, winters are 4.5 degrees warmer. It has definitely affected my gardening.

I found these videos quite hard to watch. There is a level of cringe that I found a bit unpleasant.

It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.


I skimmed through the article but didn’t find mention of one glaring deficiency in iPadOS — it still doesn’t support multiple users and multiuser switching, even though the hardware is capable of it (and exceeds the capacity of many Macs before it). I decided several years ago that I’m not buying another iPad until this is sorted out by iPadOS.

I think of iPhones as personal devices, where each person may have their own. But iPads are more likely to be shared for personal use in families. The fact that each person using it cannot have their own user profiles, app data, etc., is a huge drawback. Apple has supported this for a long time (though probably not in the best way) for education, but it’s not available to others. Even tvOS supports switching between user profiles quickly.

Apple enforcing the idea that iPad (with iPadOS) should also be a personal device — one device per person — makes the user experience quite poor.


The system works! Just raise your concerns and they'll get around to it in [checks notes] 18 years

https://twitter.com/cperciva/status/1785402732976992417


Every eINK controller sucks. This person took upon themselves to fix that, and released the result, which is now the state of the art, as open source hardware.

I love people and projects like this.


A few months ago there were articles going around about how Samsung galaxy phones were upscaling images of the Moon using AI [0]. Essentially, the model was artificially adding landmarks and details based on its training set when the real image quality was too poor to make out details.

Needless to say, AI upscaling as described in this article would be a nightmare for radiologists. 90% of radiology is confirming the absence of disease when image quality is high, and asking for complementary studies when image quality is low. With AI enhanced images that look "normal", how can the radiologist ever say "I can confirm there is no brain bleed" when the computer might be incorrectly adding "normal" details when compensating for poor image quality?

[0] - https://news.ycombinator.com/item?id=35136167


A Google search for practically any long-tail keywords will reveal that LLMs have already had a very significant impact. DuckDuckGo has suffered even more. Social media is absolutely lousy with AI-powered fraud of varying degrees of sophistication.

It's glib to dismiss safety concerns because we haven't all turned into paperclips yet. LLMs and image gen models are having real effects now.

We're already at a point where AI can generate text and images that will fool a lot of people a lot of the time. For every college-educated young person smugly pointing out that they aren't fooled by an image with six-fingered hands, there are far more people who had marginal media literacy to begin with and are now almost defenceless against a tidal wave of hyper-scaleable deception.

We're already at a point where we're counselling elders to ignore late-night messages from people claiming to be a relative in need of an urgent wire transfer. What defences do we have when an LLM will be able to have a completely fluent, natural-sounding conversation in someone else's voice? I'm not confident that I'd be able to distinguish GPT-4o from a human speaker in the best of circumstances and I'm almost certain that I could be fooled if I'm hurried, distracted, sleep deprived or otherwise impaired.

Regardless of any future impacts on the labour market or any hypothesised X-risks, I think we should be very worried about the immediate risks to trust and social cohesion. An awful lot of people are turning into paranoid weirdos at the moment and I don't particularly blame them, but I can see things getting seriously ugly if we can't abate that trend.


I'm ceaselessly amazed at people's capacity for impatience. I mean, when GPT 4 came out, I was like "holy f, this is magic!!" How quickly we get used to that magic and demand more.

Especially since this demo is extremely impressive given the voice capabilities, yet still the reaction is, essentially, "But what about AGI??!!" Seriously, take a breather. Never before in my entire career have I seen technology advance at such a breakneck speed - don't forget transformers were only invented 7 years ago. So yes, there will be some ups and downs, but I couldn't help but laugh at the thought that "14 months" is seen as a long time...


Years ago, over a decade ago now, I was a .Net developer. Microsoft introduced Entity Framework, their new way of handling data in .Net applications. Promises made, promises believed, we all used it. I was especially glad of Lazy Loading, where I didn't have to load data from the database into my memory structures; the system would do that automatically. I could write my code as if all my memory structures were populated and not worry about it. Except, it didn't work consistently. Every now and again a memory structure would not be populated, for no apparent reason. Digging deep into technet, I found a small note saying "if this happens, then you can check whether the data has been loaded by checking the value of this flag and manually loading it if necessary" [0]. So, in other words, I have to manually load all my data because I can't trust EF to do it for me. [1]

Long analogy short, this is where I think AI for coding is now. It gets things wrong enough that I have to manually check everything it does and correct it, to the point where I might as well just do it myself in the first place. This might not always be the case, but that's where I feel it is right now.

[0] Entity Framework has moved on a lot since then, and apparently now can be trusted to lazily load data. I don't know because...

[1] I spat the dummy, replaced Windows with Linux, and started learning Go. Which does exactly what it says it does, with no magic. Exactly what I needed, and I still love Go for this.


> Email addresses published on webpages usually need to be protected from email-harvesting spambots.

Do they though?

I have had my email address published on my website in a <a href="mailto:… for like 20 years and I don't get spam that would get through the spam filter.

I use both Gmail and (for some other addresses) a webmail hosted by a local company which uses some other filter. Both work well, so it's not something only Google can do.


I think the live demo that happened on the livestream is best to get a feel for this model[0].

I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557


They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715


Narrator: A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.

Business woman on plane: Are there a lot of these kinds of accidents?

Narrator: You wouldn't believe.

Business woman on plane: Which car company do you work for?

Narrator: A major one.


All these demo style ads/videos are super jarring and uncanny valley-esque to watch as an Australian. The US corporate cultural norms are super bizarre to the rest of the world, and the California based holy omega of tech companies really takes this to the extreme. The application might work well if you interact with it like you are a normal human being - but I can't tell because this presentation is corporate robots talking to machine robots.

I wanted to create a voice assistant that is completely offline and doesn't require any internet connection. This is because I wanted to ensure that the user's privacy is protected and that the user's data is not being sent to any third party servers.

Props, and thank you for this.


Top 6 science guys are long gone. Open AI is run by marketing, business, software and productization people.

When the next wave of new deep learning innovations sweeps the world, Microsoft eats whats left of them. They make lots of money, but don't have future unless they replace what they lost.


It doesn't have hands.

It brings to mind the story of how when Boris Yeltsin was visiting the US, he took an impromptu detour to a random American supermarket to try to catch them off guard, only to be blown away that Americans really did have supermarkets everywhere practically overflowing with food. The story goes that the experience played a big role in shaping his vision for Russia when he went on to become its first freely elected leader a few years later.

https://www.cato.org/blog/happy-yeltsin-supermarket-day

Or similarly there's the story of the Lykov family, who lived life cut off from humanity for 40 years but still somewhat understood what the new, moving "stars" in the night sky must be: https://www.smithsonianmag.com/history/for-40-years-this-rus...

Edit - Plus, this quote: “What amazed him most of all,” Peskov recorded, “was a transparent cellophane package. ‘Lord, what have they thought up—it is glass, but it crumples!’”


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: