I think LaMBDA would be really fun. If you asked ChatGPT what movies it likes, it would tell that it is a large language model trained by OpenAI and it can't have opinions yada yada yada.
The total data that the page will have to load on startup (probably using Fetch API) is:
- 74 MB for the Whisper tiny.en model
- 240 MB for the GPT-2 small model
- Web Speech API is built-in in modern browsers
cool but im now wondering what it would take to bring this down enough to put this in real apps? anyone talking about this?
Unfortunately these smaller models are also terrible at performance, particularly the GPT-2 model small model is really unsuitable for the task of generating text. The largest models publicly available, which are nowhere near GPT-3 Da Vinci level, are tens of GBs.
We may be able to reduce the size without sacrificing performance, but that's an area of active research still.
Isn't WEB 2.0 era current era? I mean WEB 3.0 era is in relation to blockchains only, not the rest. The proponents of "everything on blockchain" they actually want that for everything (not that will ever work, but that's beyond our discussion)
Given Whisper is open source, I'd be surprised if it's not. It would be cool for Web Speech API's SpeechRecognition to simply use it, though that would make browser downloads a little beefier.
It could easily be downloaded separately in the background once the browser application is already up and running. Would be great to have it in the browser though for sure.
To be honest, I expect that in 10 years people will regularly use these sorts of text generation tools in the way text prediction and thesauruses and grammar checkers and spellcheckers are used today but for bigger blocks of text.
I can't really see why not anyway, as more things are in the browser it makes sense to me to integrate the ability to "AI check" your text like a grammar or spell checker to improve your writing along some dimensions that you like.
It's not honest, but in kind of the same way that a spellchecker isn't honest and since it's going to be possible anyway I don't see what extra harm it causes to make it accessible for everyone so that we can both actually see an upside and also begin to recognize that text we read is at this point likely to be at least partially AI generated and potentially factually incorrect.
Even better if things like Firefox reader mode, one of my favorite tools, can also do text summarization. Just imagine the adversarial interaction between a tool designed to generate confident sounding fluff and one to summarize confident sounding fluff. Honestly it seems like a likely inevitable future path.
It may as well be part of the browser where it stands a better chance of keeping people's long term attention on the ease of using these tools. Spammers will be able to do it, fake journalists and such will be able to do it, better if we can do it too so that at least we are aware of the potential abuse.
We need much better models in browsers. The main reason is to pass everything through the language model and get polite and helpful responses. You never have to see Google, the website or the ads ever again if you don't want to. The QA model should be able to detect most undesirable parts - spam, ads, fakes, factually incorrect data. Something like chatGPT running locally. This is important for privacy. If we run the model, we have a safe creative space. If they run the model, they get everything spilled out.
Listening to that demo, it's incredible how far we've come!
Or, not.
Racter was commercially released for Mac in December 1985:
Racter strings together words according to "syntax directives", and the illusion of coherence is increased by repeated re-use of text variables. This gives the appearance that Racter can actually have a conversation with the user that makes some sense, unlike Eliza, which just spits back what you type at it. Of course, such a program has not been written to perfection yet, but Racter comes somewhat close.
Since some of the syntactical mistakes that Racter tends to make cannot be avoided, the decision was made to market the game in a humorous vein, which the marketing department at Mindscape dubbed "tongue-in-chip software" and "artificial insanity".
It's only amazing that chatGPT backed by GPT-3 is the first thing since then to do enough better that everyone is engaged.
I owned that in 1985, and having studied AI/ML previously I've been (and remain something of) an AGI skeptic. But now in 2022, I finally think “this changes everything” ... not because it's AI, but because it's making the application of matching probabilistic patterns across mass knowledge practical and useful for everyday work, particularly as a structured synthesis assistant.
It looks like the ChatGPT APIs that work well are the ones that are implemented as a browser extension and reusing the bearer token that you get by signing into ChatGPT from the same browser. I'm guessing since you're using pyttsx3 that you wrote a Python app instead and not in the browser?
Technically this seems to work, and mad props to the author for getting to this point. On my computer (MacBook Pro) it's very slow but there are enough visual hints that it's thinking to make the wait ok. I have plenty of complaints about the output but most of that is GPT-2's problem.
It's almost the same model architecture, but GPT3 is much better trained. GPT3 is coherent, while GPT2 is prone to generating gibberish or getting stuck in a loop. The advantage is pretty significant for longer generations.
That being said, neither GPT3 nor GPT2 are "efficient" models.
On the one hand, they use inefficient architectures - starting with using a BPE Tokenizer, to having dense attention without any modifications, to being a decoder only architecture etc. Research has come up with many more fancy ideas on how to make all this run better and with less compute.
But there is a reason why GPT2/3 are architecturally simple and inefficient: we know how to train these models reliably (more or less) on thousands of GPUs, whereas the same might not be true for more modern and efficient implementations. For instance, when training OPT, Facebook started using more fancy ideas but finally ended up going back to GPT-3 esque basics, simply because training on thousands of machines is a lot harder than it seems in theory.
On the other hand, these models have far too many parameters compared to the data they were trained on. You might say they are undertrained - or they lean heavily on available compute to make up for missing data. In any case, much smaller models (like Chinchilla by DeepMind) match their performance with less parameters (and hence compute or model size) by using more and better data.
In closing, there are better models for edge devices. This includes GPT clones like GPT-J in 8bit, or distilled version thereof. Similarly, there is still a lot of gains that will happen when all the numerous efficiency improvements get implemented in a model that operates at the data/parameter efficiency frontier.
Still, even when considering efficient models like Chinchilla and then even more architecturally efficient versions thereof - we are still talking about a lot of $$$ to train these models. And so we are yet further from having OpenSource implementations of these models than we are from someone (like DeepMind) having them...
With time, you can expect to run coherent models on your edge device. But not quite yet.
Interestingly, Code models are constrained even more by difficulties of tokenization in light of - most crucially - us not having actually that much code to train on (we already train on all of of github, and it doesn't "saturate" the model).
At this stage, we are back to improving model efficiency, I think, especially for code models. But not there yet.
Sorry for the rambling, the actual answer is no I do not have a really good codex type model in open source.. yet
I see. The Open AI code generator gave me really impressive results for basic to intermediate questions in the data analytics space. I think it's a function of the context you give about the problem (aka what are the literal meaning of the columns in the business context) and how objective your question - to the model - is, plus some other internal model variable that I'm completely unaware of. But it's nice to have your input so I can understand a little bit what happens under the hood!
Size of the model is a big one. GPT-3 has over 10x as many parameters for example. Training data would be another huge one. Architecturally, they aren't that different if I recall correctly, it's a decoder stack of transformer like self-attention.
Real world capability has GPT-3 giving much better answers, it was a big step up from GPT-2.
Is it anywhere near being able to be run on local consumer hardware?
How long until we can have the GPT3 or 3.5 chatbot locally like we have StableDiffusion locally for image generation?
I've been spoiled by having it accessible offline and with community built support/modifications to it. GPT-3 is super neat but feels like too many guard rails or the custom playground is too pricey.
For inference so basically running it you need multiple GPUs and hundreds of GBs of GPU memory. As to model size it's around 100x bigger than SD.
You can forget about running it locally unless you have dozens of high-end GPUs or you want to wait hours/days/weeks (depending on your hardware) for a single response.
I've been thinking of doing something like this but hooked up with ChatGPT/GPT-3-daviinci003. Obviously model will not load in the browser but we cna call the API. Could be a neat way to interact with the bot.