I wanted to explore different open source Gen AI tools and brought them together to generate this youtube video: voice, music, image and text. All processed on a single RTX3090. The text for the video is generated using the wikipedia article on Octopus as a source.
Nice actually. I'd watch it, but it does not pronounce acronyms correctly (4.3 em) — this shows that it has low effort in it (probably already, took less time, than video length itself) and ruins the experience
reply