The easiest way I found to play around with LLMs was through Hugging Face. The Hugging Face Python libraries offer a straight-forward way to call state-of-the-art LLMs and query them. I've been going over the Hugging Face LLMs tutorial. The following couple of posts are based on Chapter 2 of the Hugging Face course. These posts mostly summarise and sometimes expand on the material in the chapter.

The Hugging Face transformers library allows downloading a pre-trained model. The model has an architecture. This is the basic framework or blueprint of the machine-learning model to be. It contains all parameters and how these are handled. A snapshot of the architecture that has been trained on data is referred to as a checkpoint. Note, the training does not need to be done when making the snapshot. There are also many models that have been pre-trained but require further fine tuning before actual deployment. Together the architecture and checkpoint constitute a model. However, terminology is not strict here and sometimes the term model is used to refer to either architecture or checkpoint.

Pipelines are used by Hugging Face to call models for deployment and experimentation. The code below call a text-classification model called distilbert-base-uncased-finetuned-sst-2-english.

from transformers import pipeline

pipe = pipeline("text-classification", model = "distilbert-base-uncased-finetuned-sst-2-english")
pipe("I really enjoy swimming!")

Running this will output

[{'label': 'POSITIVE', 'score': 0.9998099207878113}]

In the code above, quite a lot of stuff happens under the hood. The pipeline consists of a tokenizer, model, and lastly a post-processing step. Splitting this code up in the separate steps gives the following code, which we will go over in more detail in the next couple of posts.

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = ["I really enjoy swimming!"]
tokens = tokenizer(sequence, return_tensors="pt")
output = model(**tokens)

predictions = torch.nn.functional.softmax(output.logits, dim=-1)
highest_probability = predictions[0].max().item()
highest_label = predictions[0].argmax().item()
print("Highest probability:", highest_probability, ", label:", model.config.id2label[highest_label])