Hacker News new | past | comments | ask | show | jobs | submit login
VASA-1: Lifelike audio-driven talking faces generated in real time (microsoft.com)
340 points by vyrotek 13 days ago | hide | past | favorite | 157 comments






And it's only going to get faster, better, easier, cheaper.[a]

Meanwhile, yesterday my credit card company asked me if I wanted to use voice authentication for verifying my identity "more securely" on the phone. Surely the company spent many millions of dollars to enable this new security-theater feature.

It begs the question: Is every single executive and manager at my credit card company completely unaware that right now anyone can clone anyone else's voice by obtaining a short sample audio clip taken from any social network? If anyone is aware, why is the company acting like this?

Corporate America is so far behind the times it's not even funny.

---

[a] With apologies to Daft Punk.


> Is every single executive and manager at my credit card company completely unaware that right now anyone can clone anyone else's voice by obtaining a short sample audio clip taken from any social network?

Your mistake is assuming the company cares. The "company" is a hundred different disjointed departments that only care about not getting caught Equifax-style (or filing for bankruptcy if caught). If the marketing director sees a shiny new thing that might boost some random KPI they may not really care about security.

However in the rare chance that your bank is actually half decent, I'd suggest contacting their IT/Security teams about your concerns. Maybe you'll save some folks from getting scammed?


Also, this feature is probably just some midd level execs plan for a bonus, not a rigorously reviewed and planned. It's also probably in the pipeline for a decade so if they don't push it out, suddenly they get no bonus for cancelling a project.

Corporations are ultimately no better than governments and likely worse depending on what their regulatory environment looks like.


There’s a really important thing here for anyone trying to do sales to big companies.

Find an exec that needs a project to advance their career. Make your software that project.

Suck in as many other execs into the project so their careers become coupled to getting your software rolled out.


That's clever!

That scene from Sneakers would be so different nowadays. [1]

"My voice is my passport. Verify me." [2]

1. https://youtu.be/WdcIqFOc2UE?si=Df3DtSakatp9eD0L

2. https://youtu.be/-zVgWpVXb64?si=yT2GZpb7E2yZoEYl


Any time you add a "new" security gate to your product, it should be in addition to and not instead of the existing gates. Biometrics should not replace username/password, they should be in addition to. Security Questions like "What was your first pet's name" should not be able to get you in the backdoor. SMS verification alone should not allow you to reset your password. Same with this voice authentication stuff. It should be another layer, not a replacement of your actual credentials.

If you treat it as OR instead of AND, then your security is only as good as the worst link in the chain.


If you make your product sufficiently inconvenient, then you'll have the unassailable security posture of having no users.

Yes, they are and they also know it isn't foolproof so that isn't the only information being compared against. Some services compare the calling number is compared against live activity on the PSTN (ie, subscriber's phone is not in an active call, but their number is being presented as as the caller ID is one such metric). Many of these deep fake generators with public access have watermarks in the audio. The audio stream comparison continues, it needs to speak like you, word and phrase choices. There are other fingerprints of generated audio that you can't hear, but are still obvious at the moment. With security, it always cat and mouse with fraudsters on one hand and the effort/frustration with customers on the other.

Asking customers questions that they don't remember and that fraudsters have in front of them isn't working and the time it takes for agents to authenticate is very expensive.

While there is no doubt that companies will screw up with security, you are making wild accusations without reference to any evidence.


The point is lowering liability. By choosing to not use voice authentication (or whatever), it becomes easier to argue that fraud is your fault. Or if you did use it, the company 'is doing everything they can' and 'exceeding industry standards' so it isn't their fault, either. It also just makes them seem more secure to the uninitiated (the security-theater bit, yes).

Maybe one day someone will successfully argue that adding easily defeated checks lowers security, by adding friction for no reason or instilling false confidence in users at both ends.


Pretty much. You think they’re smart or with it. They’re just lucky fogies

I mean, what do you want them to do? If we think their security officers are freaking out and holding meetings right now about what to do, or if they're asleep at the wheel, we'd be seeing the same thing from the outside, no?

No, because multiple companies are pushing this atm. If it was only company I would agree, but with multiple, you'd have at least one that would back out of it again.

This is good but nowhere as good as EMO https://humanaigc.github.io/emote-portrait-alive/ (https://news.ycombinator.com/item?id=39533326)

This one has too much fake looking body movement and looks eerie/robotic/uncanny valley. The lips don't sync properly in many places. Eye movement and over all head and body movement is not very natural at all.

While EMO looks just perfect mostly. The very first two videos on EMO page are perfect example of that. See the rap near the end to see how good EMO is at lip sync.


Another research project with 0 model release

There were some misses with emo too, but Hepburn at the end was amazing.

This is real time!

What this is starting to reveal is that there's a clear need for some kind of chain of custody system that guarantees the authenticity of what we see. Nikon/Canon tried doing this in the past, but improper storage of private keys lead to vulnerabilities. As far as I'm aware it's never extended to video either.

With modern secure hardware keys it may yet be possible. The difficulty is that any kind of photo/video manipulation would break the signature (and there are practical reasons to want to be able to edit videos obviously).

In the ideal world, any mutation to the original source content could be traceable to the original source content. But that's not an easy problem to solve.


No, we are merely returning to the pre-photography state of things where a mere printed image is not sufficient evidence for anything.

True, an image, audio clip, or video is not enough evidence to establish truth.

We still need a way to establish truth. It's important for security cameras, for politics, and for public figures. Here are some things we could start looking into.

* Cameras that sign their output. Yes, this camera caught this video, and it hasn't been modified. This is a must for recordings being used in court evidence IMO. Otherwise framing a crime is as easy as a few deep fakes and planting some DNA or fingerprints at the scene of the crime.

* People digitally signing pictures/audio/videos of them. Even if they digitally modified the data it shows that they consent to having their image associated with that message. It reduces the strength of the attack vector of deep fake videos for reputation sabotage.

* Malicious content source detection and flagging. Think email spam filter type tagging of fake content. Community notes on X would be another good example.

* Digital manipulation detection. I'm less than hopeful this will be the way in the long term, but could be used to disprove some fraud.


Blockchains can be used for cryptography time-stamping.

I’ve always had a suspicion that governments and large companies would prefer a world without hard cryptographic proofs. After wikileaks they noticed DKIM can cause them major blowback. Somehow general public isn’t aware all the emails were proven authentic with DKIM signatures and even in fairly educated circles people believe the “emails were fake” but it’s not actually possible.


Quite the opposite, governments and large companies even explicitly run services for digital timestamping of documents - if I wanted to potentially assert some facts in court, I'd definitely prefer having that e-document with a timestamp notarized from my local government service instead of Bitcoin, because while the cryptography is the same, it would be much simpler from the practical legal perspective, requiring less time and effort and cost to get the court to accept that.

Signing is great, but the hard part is managing keys and trust.

Every image is an NFT?

merely

You say this as if it were not a big deal, but losing a century's worth of authentication infrastructure/practises is a Bad Thing which will have large negative externalities.


It isn't really though. It has been technical possible to convincingly doctor photos for some time already, gradually getting easier, cheaper, and faster with time for decades, and even now the currently available tech has limitations and the full change is not going to happen overnight.

Pre-photography it at least took effort, practice, and time, to draw something convincing. Any skill with that much of a barrier to entry kind of automatically reduces the ability to be anonymous. And we didn't have the ability to instantaneously distribute images world-wide.

There goes the dashcam industry…

You're being downvoted but I think the comment raises a good question. what will happen when someone gets accused of doctoring their dashcam footage? Or any footage used for evidence.

I wasn’t really kidding with my comment. I just recently used camera footage as part of an accident claim and the assessor immediately said “that wasn’t your fault, we take responsibility on behalf of the driver”.

In a few years time when (if) faking realistic footage becomes trivial, I suspect this kind of video will have a much, much higher level of scrutiny or only be accepted from certain sources such as government owned traffic cameras.


I think it's practically impossible for such a system to be globally trustworthy due to the practical inevitability of "but improper storage of private keys lead to vulnerabilities" scenarios.

People will expect or require that chain of custody only if all or at least the vast majority of the content they want would have that chain of custody.

Photo/video content will have that chain of custody only if all or almost all of devices recording that content will support it - including all the cheapest mass-produced devices in reasonably widespread use anywhere in the world.

And that chain of custody provides the benefit only if literally 100% of these manufacturers have their private keys secure 100% of the time, which is simply not happening; at least one such key will leak, if not unintentionally then intentionally for some intelligence agency who wants to fake content.

And what do you do once you see a leak of the private keys used for signing the certificates for the private keys securely embedded in (for example) all of 2029 Huawei smartphones, which could be like 200 million phones? The users won't replace their phones just because of that, and you'll have all these users making content - so everyone will have to choose to either auto-block and discard everything from all those 200 million users, or permit content with a potentially fake chain of custody; and I'm totally certain that most people will prefer the latter.


Multisig by the user and camera manufacturer can help to some extent.

Multisig requires user cooperation, many users will not care to cooperate, and chain of custody verification really starts working only if you can get (force) ~100% of legitimate users globally to adopt the system.

Also, for the potential creators of political fakes, such a multisig won't change things - getting a manufacturer's key may take some effort, but getting (and 'burning') keys of a dozen random people is relatively trivial in many ways - e.g. buying off of poor people, stealing from compromised random machines, or simply issuing fake identities for state-backed actors.


I expect this type of system to be implemented in my lifetime. It will allow whistleblowers and investigative sources to be discredited or tracked down and persecuted.

Unfortunately that seems inevitable.

None of that works, it's simply theatre.

I can just take a (crypto-signed) photo of another photo.


The public block chain would show the chain of custody/ownership, so the photo of a photo would show the the final crypto signature does not belong to the claimed owner.

You are correct that I as a viewer can't just rely on a crypto-signature like a watermark, I'd have to verify the chain of custody, but if I wanted to do that, it is available to do so.


Why is this research being done? Is this some kind of arms race? The only purpose of this technology I can think of is getting spies to abuse others.

Am I going to have to do AuthN and AuthZ on every phone call and zoom now?


I get the feeling it's "someone's going to do this, so it might as well be us."

It's fascinating how research can take on a life of its own and will be pushed, by someone, to its own conclusion. Even for immensely destructive technologies (e.g., atomic weapons, viruses), the impact of a technology is its own attractor (could you say that's risk-seeking behavior?)

> Am I going to have to do AuthN and AuthZ on every phone call and zoom now?

"Alexa, I need an alibi for yesterday at noon."


> Why is this research being done?

I think it's mostly "because it can be done". These types of impressive demos have become relatively low hanging fruit in terms of how modern machine learning can be applied.

One could imagine commercial applications (VR, virtual "try before you buy", etc), but things like this can also be a flex by the AI labs, or a PhD student wanting to write a paper.


> It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.

Teams started rolling out Avatars https://techcommunity.microsoft.com/t5/microsoft-teams-blog/..., this would be a step up. I'm not really a fan but that doesn't mean I can excuse the use case.


Advertising. Now you and your friends star in the streaming commercials and digital billboards near you! (whether you want to or not)

Newscasters and other talking heads will be out of business. Just pipe the script into some AI and get video.

I was thinking about this the other day. An implantable Yubikey type device that integrates with whatever device you’re using to validate your identity for phone calls or video conferences.

Subdermal X.509 maybe with some sort of neurolink adapter so you can confirm the request for identity. Though, first versions might be just a small button you need to press during the handshake.


We all know why this is really happening. Clippy 2.0.

On the other hand, if deepfaking becomes common enough that everyone stops trusting everything they read / see on the internet, it would be a net good against the spread of disinformation compared to today.

everyone stops trusting everything

Why would you expect this to happen? Lots of people are gullible, if it were otherwise a lot of well-known politicians would be out of a job or would never have been elected to begin with.


If it's even commoner than "common enough" then anyone could at least try to help their gullible friends and family by sending them a deepfake video of them doing/saying something they've never said. A lot of people will suddenly wise up when a problem affects them directly.

I don't see that as an outcome. We have already seen a grand erosion of trust in institutions. Moving to an even lower trust society does not sound like it would have positive consequences for discourse, public policy, or society at large.

The benefit is that you can only trust in person interaction with social and governmental institutions so people will have to leave their damn house again and go talk to each other face to face. Too many of our current problems are caused by people only interacting with each other and the world through third parties who are performing a MITM operation for their own benefit.

This assumes that it's a two-way door.

Over the past century and a half, we've moved into vast, anonymous spaces, where I'm as likely to know and get along with my neighbour as I am to win the lottery.

And this is important. No, it's not just a matter of putting on an effort to learn who my neighbour is -- my neighbour is literally someone whose life experiences are wildly different, whose social outcomes will be wildly different, whose beliefs and values are wildly different, and, for all I know, goes to conferences about how to eliminate me and my kind.

(This last part is not speculation; I'm trans; see: CPAC)

And these are my reasons. My neighbour is probably equivalently terrified of me, or what I represent, or the media I consume, or the conferences that I go to.

Generalizing, you can't take a bunch of random people whose only bond is that they share meatspace-proximity, draw a circle around them, and declare them a community; those communities are _gone_, and you can no more bring them back than you can revive a corpse. (This would also probably not be a good idea, even if it were possible: they were also incredibly uncomfortable places for anyone who didn't fit in, and we have generations of fiction about people risking everything to leave for those big anonymous cities we created in step 1.)

So, here we are, dependent on technology to stay in touch with far-flung friends and lovers and family, all of us, scattered like spiderwebs across the globe, and now into the strands drips a poison.

Daniel Dennett was right. Counterfeit people are an enormous danger to civilization. Research like this should stop immediately.


Ironically low effort deep fakes might increase trust in organizations that have had the budget to fake stuff since their inception. The losers are 'citizen journalist' broadcasting on Youtube etc.

I don't see the extinction of trust through the introduction of garbage falsehoods to be a net good.

Believing that everything you eat is poisoned is no way to live. Believing that everything you see is a lie is also no way to live.


Before photography this was just the normal state of the world. Think a little, back then any story or picture you saw was made by a person and you only had their reputation to go by. Think some more and you realize that’s never changed even with pictures and video. Easy AI generated pictures and video just remove the illusion of trust.

That's the whole issue though, spread of disinformation eroded trust, furthering this into obliteration of all trust is not a good outcome.

Because the text for this is only a slight variation of the tech for a broad range of legitimate applications?

Because even this precise tech has legitimate use cases?

> The only purpose of this technology I can think of is getting spies to abuse others.

Can you really not think of any other use cases?


Why don't you list some legitimate and useful values of this work? Especially at the price we and this company are paying.

Why don't you get more specific about your claims?

Jeez. I dunno. Sometimes I just reach my threshold for the time I'm prepared to spend debating with strangers on the internet.

I see lots of comments wondering what the use-cases for this technology are. It is scary, and it can and will be abused. However, there are a lot of genuine applications. We may or may not like these applications, but they are going to happen. Here are a few, just off the top of my head:

- Advertising. Where I am, there is a pervasive commercial with (very realistic) talking goats. Why not do the same thing with people? No need for actors, when you can just tell the computer what you want.

- Cinema and television. Especially for bit parts and extras, just create the characters, instead of rounding up a bunch of extras.

- Video games and alternate realities. They've been getting more and more realistic - this is just the next step.

- Pornography. Again, why trouble yourself with real actors? Sell premium videos customized to each customer.

- Politics. Give "live" speeches in different venues, without all the bother of travelling. It's a very small step to answering questions live - just train up an LLM with the responses you want it to give.

Lots more applications as well - those are just a few that come to mind.


The paper mentions it uses Diffusion Transformers. The open source implementation that comes up in Google is Facebook Research's PyTorch implementation which is a non-commercial license. https://github.com/facebookresearch/DiT

Is there something equivalent but MIT or Apache?

I feel like diffusion transformers are key now.

I wonder if OpenAI implemented their SORA stuff from scratch or if they built on the Facebook Research diffusion transformers library. That would be interesting if they violated the non-commercial part.

Hm. Found one: https://github.com/milmor/diffusion-transformer-keras


This is absolutely crazy. And it'll only get better from here. Imagine "VASA-9" or whatever.

I thought deepfakes were still quite a bit away but after this I will have to be way more careful online. It's not far from behind something that can show up in your "YouTube shorts" feed and trick you if you didn't already know it was AI.


This is good but nowhere as good as EMO https://humanaigc.github.io/emote-portrait-alive/ (https://news.ycombinator.com/item?id=39533326)

This one has too much movement and looks eerie/robotic/uncanny valley. While EMO looks just perfect.


Hard disagree -- I think you might be misremembering how EMO looks in practice -- I'm sure we'll learn VASA-1 "telltales" but to my eyes there are far fewer than EMO - zero of the EMO videos were 'perfect' for me, and many show little glitches or missing sync. VASA-1 still blinks a bit more than I think is natural, but it looks much more fluid.

Both are, BTW, AMAZING!! Pretty crazy.


In VASA there is way to much body movement instead of just being he head as if camera is moving in the strong winds. EMO is a lot more human like. In the very first video on the EMO page I still cannot see it as a generated video, its that real. The lip movement, the expressions are in almost in perfect sync with the voice. That is absolutely not the case with VASA

If you see talking heads with static/simple/blurred backgrounds from now on, assume it is fake. In the near future they will accompany realistic backgrounds and even less detectable fakes, we will have to assume all vids could be faked.

I wonder how video evidence in court is going to be affected by this. Both from a defense and prosecution perspective.

Technically videos could've been faked before but it would require a ton of effort and skill that no average person would have.


Just as before, a major part of photo or video evidence in court is not the actual video itself, but a person testifying "on that day I saw this horrible event, where these things happened, and here's attached evidence that I filmed which illustrates some details of what I saw." - which would be a valid consideration even without the photo/video, but the added details do obviously help.

Courts already wouldn't generally approve random footage without clear provenance.


There will be a new cottage industry of AI detectives that serve as expert witnesses and they will attest to the originality of media to the court

I still find the faces themselves to be really obviously wrong. The sound is just off, close enough to tell who is being imitated but not particularly good.

Especially the hair "physics" and sometimes the teeth shift around a bit.

But that's nitpicking. It's good enough to fool someone not watching too closely. And the fact that the result is this good with a single photo is truly astonishing, we used to have to train models on thousands of photos for days only to end up with a worse result!


It's interesting to me that some of the long-standing things are still there. For example, lots of people with an earring in only one ear, unlikely asymmetry in the shape or size of their ears, etc.

I like the considerations topic.

There’s likely also a an unsaid statement. This is for us only and we’ll be the only ones making money from it with our definition of “safety” and “positive”.


So an ugly person will be able to present his or her ideas on the same visual level as a beautiful person. Is this some sort of democratization?

> To show off the model, Microsoft created a VASA-1 research page featuring many sample videos of the tool in action

With AI stuff, I have learned to be very skeptical until and unless a relatively publicly accessible demo with user specified inputs is available.

It is way too easy for humans to cherry pick the nice outputs, or to take advantage of biases in the training data to generate nice outputs, and is not at all reflective of how it holds up in the real world.

Part of the reason why ChatGPT, Stable Diffusion, Dall-E had such an impact is the people could try and see for themselves without being told how awesome it was by the people making it.


Wow. Just...wow. I watched one of the videos very carefully. Absolutely natural movements of eyes, eyebrows, lips, even the hair resting on the woman's shoulders. The text even contained a double entendre, and the facial expression was exactly right.

Of course, I'm sure that whoever put these demos together also invested time in getting them right. Still: so would anyone seeking to put words in another person's mouth.

Imagine your favorite (or least favorite) politician coming out with a speech promoting something awful. Even if the video were immediately debunked, people would remember it. And many people would never believe the debunking.

This is the world we now live in: you literally cannot trust anything you see online.


Oh no. "Cameras on please!" will be replaced by "AI generated faces off please!" in teams.

Oh god don't watch their teeth! Proper creepy.

Still, apart from the teeth this looks extremely convincing!


The teeth resizing dynamically is incredibly distracting, or more positively, a nice way to identify fakes. For now.

yeah, teeth, tongue movement and lack of tongue shape and the "stretching" of the skin around the cheeks in the images pushed the videos right into the uncanny valley for me.

My first thought was "oh no the interview fakes", but then I realized - what if they just kept using the face? Would I care?

It would be interesting that a remote candidate could easily identify as whatever ethnicity, age or even gender they consider most beneficial for hiring to avoid discrimination or fit certain diversity incentives.

Tech like this has the potential to bring us back to the days of "on the Internet, nobody know's you're a dog" https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_...


Yeah, even if they just use LLMs to do all the work, or are a LLM themselves, as long as they can do the work I guess.

Weird implications for various regulations though.


Despite vast investment in AI by VCs and vast numbers of startups in the field, these sort of things remain unavailable as simple consumer installable software.

Every second day HN has some post about some new amazing AI system. Never available to download run and use.

Why the vast investment and no startup selling consumer downloadable software to do it?


A fantastic technological advance for election interference!

Such an exciting startup idea! I'm thrilled!

As if this technology was needed.


The people that killed my fam used this to maintain the illusion they were alive for like 4 years and extract informaton etc. On one hand it was nice to see them but on the other a very odd feeling talking to them knowing they were dead (will for sure get down voted but idk trippy interesting skynet moment the usual crowd on HN will never experience)

We need some clear legislation around this right now.

counterpoint: we don't need any more legislation

I tend towards agreeing with you. Many of the problems, like impersonation, are already illegal.

And replacing a person which spreads lies, as can be seen in most TV or glossy cover ads, shouldn't trigger some new legal action. The only difference is that now the actor is also a lie.

And countries which use actors or news anchors for spreading propaganda surely won't see an issue with replacing them with AI characters.

People who then get to read that their most favorite, stunningly beautiful Instagram or TikTok influencer is nothing but a fat, chips-eating ugly person using AI, may try to raise some legal issues to soothe their disappointment. They then might raise a point which sounds reasonable, but which would then force politicians to also tackle the lies which are spread in TV/Magazines ads.

Maybe clearly labeling any use of this tech, maybe even with a QR code linking to who is the owner of the AI, similar to QR codes on meat packaging which allow you to track the origin of the meat, would be something what laws could be helpful with, in the spirit of transparency.


Legislation only impairs the good guys.

In which jurisdiction?

What jurisdiction would not benefit from legislation around duplicating people's identities using AI?

The GPU requirements for realtime video generation are very minimal in the grand scheme of things. Assault on reality itself.

woah. so far not in the news. this is the only article https://www.ai-gen.blog/2024/04/microsoft-vasa-1-ai-technolo...

Anyone have any good ideas for how we're going to do politics now?

Today a big ML model can do this and it's somewhat regulate-able, tomorrow people can do this on their contact-lens supercomputers and anyone can generate a video of anything.

Is going back to personally knowing your local representative the only way? How will we vote for national candidates if nobody knows what they think or say?


We already rely on chains of trust going back to the original source, and will still. I find these alarmist posts a bit mystifying – before photography, anyone could fake a quote of anyone, and human civilisation got quite far. We had a bit over a hundred years where phographic-quality images were possible and very hard to fake (which did and still does vary with technology), but clearly now we’re past that. We’ll manage!

The issue is better phrased as “how will we survive the transition while some folk still believe the video they are seeing is irrefutable proof the event happened?”

In the before times we didn't have social media and its algorithms and reach. Does it matter that the chains of trust debunk a viral lie 24 hours after it had spread? Not that there's a lot of trust in the chains of trust to begin with. And if you still have trust, then you're not the target of the viral lie. And if you still have trust, then how long can you hold on that trust when the lies keep coming 24/7 one after another without end. As one movie critic once put it: You might not have noticed it, but your brain did. Very malleable this brain of ours.

The civilization might be fine, sure. Now, democracy, on the other hand...


Presidential elections are frequently pretty close. Taking the electoral college into account (not the popular vote, which doesn't matter) Donald Trump won the 2016 election by a grand total of ~80,000 votes in three states[0].

Knowing that retractions rarely get viral exposure, it's not difficult to imagine that a few sufficiently-viral videos could swing enough votes to impact a presidential election. Especially when considering that the average person is not up to speed on the current state of the tech, and so has not been prompted to build up the mindset that's required to fend off this new threat.

[0] https://www.washingtonpost.com/news/the-fix/wp/2016/12/01/do...


Plausible. I was thinking over the longer-term.

Yeah I mean tabloids have been fooling people with doctored photos for decades.

Potentially we'll need slightly tighter regulations on formal press (so that people that care for accurate information have a place they can get it) and definitely we'll want to steer the culture back towards holding them accountable for misinformation, but credulous people have always had easy access to bad information.

I'm much more worried at the potential abuse cases that involve ordinary people that aren't public figures, and have much less ability to defend themselves. Heck, even celebrities are a more vulnerable targets than politicians.



Didn't see that one pretty cool, not as good as Emo or Vasa but pretty good

People already believe any quote you slap on a JPEG.

> Anyone have any good ideas for how we're going to do politics now?

If a business is showing a demo of this you can be assured that the Government already has this tech and has for a period of time.

> How will we vote for national candidates if nobody knows what they think or say?

You don't know what they think or say now - hopefully this disabuses people of this notion.


> If a business is showing a demo of this you can be assured that the Government already has this tech and has for a period of time.

That may have been true once upon a time, but it no longer is. And even in the areas it was true it was mostly for niche areas like cryptanalysis.

Governments simply cannot attract or keep the level of talent required to have been far ahead of industry on LLMs and similar tech, especially not with the huge difference in salaries and working conditions.


DNS? Might be that we need a radical (for some) change of viewpoint.

Just as there's no privacy on the internet, how about 'theres very little trust on the internet'. Assume everything not securely signed by a trusted party is false.


A large number of people don't really care about verifying what they've heard is true or not before repeating it, eventually making it fact amongst themselves.

Hell I've been guilty of spouting BS before, just because I've heard something from so many people. Then find that when I look it up, it's not true.

It's not really a tech problem, it's more of a human problem imo, like so many others. But there is literally nothing we can do about it.


Same way we've always done it; largely ignorant and apathetic masses that only care about waving their team's flag and don't give a damn about most of their teams' policies as long as they can still say X,Y and Z things at the Christmas dinner table.

Democracy is already an illusion of choice anyway; just look at democratic candidates. It's gonna be Biden V Trump _again_. For London mayoral elections Sadiq is pretty much guaranteed to get in _again_. For UK main election it's gonna be the typical Tories V Labour BS _again_, with no new fresh young candidates with new ideas.

Democracy is rotting everywhere it exists thanks to the idea of parties, party politics and the human need to pick a tribe and attack every other tribe.


> How will we vote for national candidates if nobody knows what they think or say?

i’m going to burst your bubble here, but most voters have no idea about policies or candidates. most voters vote based on inertia or minimal cues, not on policies or candidates.

i suggest you look up “The American Voter”, “The Democratic Dilemma: Can Citizens Learn What They Need to Know?” and “American National Election Studies”.


Hyper targeted placement of generated content designed to entice you to donate to political campaigns and to vote. Perhaps leading to a point where entire video clips are generated for a single viewer. Politicians and political commentators will lease their likeness and voice out for targeted messaging to be generated using their likeness. Less reputable platforms will allow disinformation campaigns to spread.

People in my circles have been saying this for a few years now, and we've yet to see it happen.

I've got my popcorn ready.

But you can rest easy. Everyone just votes for the candidate their party picked, anyway.


It'll happen - deepfakes aren't good enough yet. But when they become ubiquitous and hard to spot, it'll be chaos until the average person is mentally inoculated against believing any video / anything on the internet.

I wonder if it's possible to digitally sign footage as it's captured? It'd be nice to have some share-able demonstrably true media.

Edit: I'm a centrist and I definitely would lean one way or the other based on who the options are (or who I think they are).


“We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

> until we are certain that the technology will be used responsibly ...

That's basically "never" then, so we'll see how long they hold out.

Scammers are already using the existing voice/image/video generation apparently fairly successfully. :(


Having a delay, where people can see what's coming down the pipe, does have value. In a year there may/will be a open source model.

But knowing that this is possible is important to know.

I'm fairly clued in, and am constantly surprised at how fast things are changing.


> But knowing that this is possible ...

Who knowing this is possible?

The general elderly person isn't going to know any time soon. The SV IT people probably will.

It's not an even distribution of knowledge. ;/


Eventually someone will implement one of these really good recent ones as open source and then it will be on replicate etc. right now the open source ones like SadTalker and Video Retalking are not live and are unconvincing.

/s it doesn’t have the phrase LLM in the title

Translation: "We're attempting to preserve our moat, and this is the correct PR blurb. We'll release an API once we're far enough ahead and extracted enough money."

Like somebody on Ars noted "anybody notice it's an election year?" You don't need to release an API, all online videos are now suspicious authenticity. Somebody make a video of Trump or Biden's eyes following the mouse cursor around. Real videos turned into fake videos.


money will change that

Holy shit these are really high quality and basically in realtime on a 4090. What a time to be alive.

It really is something. 40 FPS on a 4090, damn.

lol how does something like this get only 50ish votes but some hallucinating video slop generator from some of the other competitors gets thousands?

It looks all warpy and stretchy. That's not how skin and face muscles work. Looks fake to me.

I find the hairs to be the least realistic, they look elastic, which is unsurprising: highly detailed things like hairs are hard to simulate with good fidelity.

I'm curious what is the reason for deepfake research and what the practical application is.

Can someone explain the commercial need to take someones likeness and generate video content?

If I was an a-list celebrity, I would give permission for coke to make a commercial with my likeness, provided I am allowed final approval of the finished ad?

Do I have an avatar that attends my zoom work calls?


Propaganda, political manipulation, narrative nudging, regular scams and advertising.

Even though most of those things are illegal you could just have foreign cat's paw firms do it. Maybe you fire them for "going to far" after the damage is done, assuming some even manages to connect the dots.


Video games, entertainment, and avatars seems like the big ones.

If that is really the reason then this is insane and everyone involved should put their keyboards down and stop what they are doing.

This would be as if we invented and sold nuclear weapons to dig out quarry mines faster. The inconvenience it saves us quickly disappears into the overwhelming shadow of the enormous harm now enabled.


> This would be as if we invented and sold nuclear weapons to dig out quarry mines faster.

”Project Plowshare was the overall United States program for the development of techniques to use nuclear explosives for peaceful construction purposes.”[0]

0: https://en.wikipedia.org/wiki/Project_Plowshare


Finding a mundane benign use for a terrible tool is good. Creating a terrible tool for a mundane benign purpose is reckless insanity.

Yeah, and it was terminated. Much harder to put this genie back in the bottle.

One the surface, it's a simple, understandable demo for the masses. While at the same time, it hints at deeper commercial usage.

Disney has been using digital likeness to maintain characters who's actors/actresses have died. Princess Leia is the most prominent example. Arguably, there is significant realistic value in being able to generate a human-like character that doesn't have to be recast. That character can be any age, at any time, and look exactly like the actor/actress.

For actors/actresses, I suspect many of them will start licensing their image/likeness as they look to wind down their careers. It gives them on-going income with very little effort.


In this case, replacing humans in service jobs. From the paper:

"Such technology holds the promise of enriching digital communication, increasing accessibility for those with communicative impairments, transforming education methods with interactive AI tutoring, and providing therapeutic support and social interaction in healthcare."

A convincing simulacrum of empathy could plausibly be the most profitable product since oil.


Entertainment maybe? I know that's not necessarily an ethical reason but some have made hilarious AI-generated songs already.

Apple Vision Pro personas competition

Thank god. I will finally be able to use my EyeMac with dignity.

If beautiful people have an advantage in the job market, maybe people will use deepfake technology when doing zoom interviews? Maybe they will use it to alter their accent?

State disinformation and propaganda campaigns.

Corporate disinformation and propaganda campaigns.

Personal disinformation and propaganda campaigns.

Oh Brave New World, that has such fake people in it!


Imagine being the CEO and you just grab your salary and options, go home, sit in the hot tub while one of the interns carefully prompts GPT and VASE how you are giving a speech online about strategic directions. /s

The purpose is to give remote workers the ability to clone themselves and automate their many jobs. /s

(but actually, because laziness is the driver of all innovation, I wouldn't be surprised if this happens).


maybe making a webpage with 27 videos isn't the greatest web design idea

It's up to your browser on whether those are actually loaded all at once. E.g. on Chrome Desktop with no data saver modes enabled it buffers the first couple seconds of each video then when you play it grabs the remaining MBs for that. That way you can see the videos as quick as you like but not actually have to load all 27 fully just because you opened the page.

the busted two scrolling sections on mobile really doesnt help

AI can talk with me. Why need a friend in real life?

I could see this being used in movie production.

Don't know why they're not releasing it right away.

If they can do it, so can someone else and hiding it makes it worse. If it's widely available people will quickly realise that the talking head on YT spouting racist BS is AI. This process needs to happen faster.

Ofc there will still be people who don't care or understand, but there will always be people who are for example racist and don't care if the affirmation for their beliefs comes from a human or a machine.


Oh good!

i get why this is interesting but why is it desirable?

real jurassic park "too preoccupied with whether they could" vibes


Now I can join the meeting "in a suit" while being out paddleboarding!

Cool! Now we can expect to see an endless stream of dead president's speeches "LIVE" from the White House.

This should end well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: