2023-01-22

No, really, which model?

 In another post I complained I didn't know what a machine learning model is.  So I asked ChatGPT.

Me: explain what a model is in machine learning

ChatGPT: A model in machine learning is a set of mathematical equations or algorithms that are used to make predictions or decisions based on input data. The model is trained using a labeled dataset, which means that it is given a set of input-output pairs and learns to map the inputs to the correct outputs. Once the model is trained, it can be used to make predictions on new, unseen data. The quality of a model is often measured by its ability to make accurate predictions or decisions on unseen data.

Well, that's good!  That's sort of what I thought - it's the algorithm plus the data.  But when an ML platform asks me "what model do I want to use" I still have no idea.  Is there a list of these things somewhere that describes the relevant tradeoffs?  Are there standards?  Or, is every specific implementation a new model?  But I read people are making their own ChatGPT (GPT-3) implementations.  Do they use the model without additional training?

Maybe ChatGPT has more to say about this.

Me: how do I make my own gpt-3

ChatGPT: Creating a model like GPT-3 is a complex task that requires significant resources and expertise in machine learning. Here are the general steps to create a model like GPT-3:

1. Collect a large dataset of text. GPT-3 was trained on a dataset of over 570GB of text data.

2. Pre-process the data to clean and format it for training. This includes tasks such as tokenization, lowercasing, and removing special characters.

3. Select an architecture for the model. GPT-3 uses the transformer architecture, which is known for its ability to handle sequential data like text.

4. Train the model using the pre-processed data and the selected architecture. This typically involves using a technique called unsupervised learning, where the model learns to predict the next word in a sentence based on the previous words.

5. Fine-tune the model by training it on a smaller dataset of labeled data. This can improve the model's performance on specific tasks, such as language translation or question answering.

6. Continuously evaluate the model's performance on held out test sets and adjust the model as needed

7. Deploy the model and use it to make predictions or decisions.

It's important to note that GPT-3 was developed by OpenAI and they used a lot of resources like computational power, memory and storage to train it. Additionally, it requires extensive expertise in machine learning, natural language processing, and deep learning to create a model like GPT-3.

So I guess when people say they are making their own ChatGPT they aren't really?  Or perhaps they are using the transforms on their own data?

Still confused.

Maybe this two-hour video will clear it up for me:


No comments:

Post a Comment