GPT 3.5 vs. Llama 2 fine-tuning: A Comprehensive Comparison

# · ✸ 47 · 💬 12 · one year ago · ragntune.com · samlhuillier · 📷
In this post, I document my experiments benchmarking the fine-tuning of GPT 3.5 against Llama 2 in an SQL task and a functional representation task. The performance of CodeLlama 34B and GPT 3.5 trained to convergence on an SQL task and a functional representation task. GPT 3.5 achieves slightly better accuracy on both of the tasks. For GPT 3.5 fine-tuning, OpenAI only allows the number of epochs to be configured. To make this a fair comparison with OpenAI, I did minimal hyperparameter tuning with Llama, allowed OpenAI to choose the number of epochs and trained Llama to convergence on the eval set. For the SQL task, I also used the spider eval repo to calculate the execution accuracy of SQL queries. The repo sets up dummy databases and the ground truth outputs are compared with the outputs from the queries of GPT3.5 and Llama 2.
GPT 3.5 vs. Llama 2 fine-tuning: A Comprehensive Comparison



Send Feedback | WebAssembly Version (beta)