Only Coders - Where knowledge meets opportunity

python (12.9k questions)

javascript (9.2k questions)

reactjs (4.7k questions)

java (4.2k questions)

c# (3.5k questions)

html (3.3k questions)

Questions - huggingface-tokenizers

Tokenizer didn't add BOS token when encoding the sentence

I would like to encode the sentence with BOS and EOS token. When I load a pretrained tokenizer, there is no BOS token, so I added BOS token to the tokenizer. After that, I encoded the sentence. model_...

alryosha

huggingface-tokenizers

Votes: 0

Answers: 1

Latest Answer

Even though you've specified bos_token to be a string of your choosing, you still need to set the add_bos_token property of tokenizer to True to get the tokenizer to stick a bos_token on the front of ...

picklerick63

ValueError: The state dictionary of the model you are trying to load is corrupted. Are you sure it was properly saved?

Goal: Amend this Notebook to work with albert-base-v2 model Kernel: conda_pytorch_p36. Section 1.2 instantiates a model from files in ./MRPC/ dir. However, I think it is for a BERT model, not Albert. ...

DanielBell99

python

huggingface-transformers

bert-language-model

onnx

huggingface-tokenizers

Votes: 0

Answers: 1

Latest Answer

Exactly what I was looking for, textattack/albert-base-v2-MRPC How to use from the 🤗/transformers library from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoT...

DanielBell99

Optimize Albert HuggingFace model

Goal: Amend this Notebook to work with albert-base-v2 model Kernel: conda_pytorch_p36. Section 2.1 exports the finalised model. It too uses a BERT specific function. However, I cannot find an equivale...

DanielBell99

python

huggingface-transformers

bert-language-model

onnx

huggingface-tokenizers

Votes: 0

Answers: 1

Latest Answer

Optimise any PyTorch model, using torch_optimizer. Installation: pip install torch_optimizer Implementation: import torch_optimizer as optim # model = ... optimizer = optim.DiffGrad(model.parameters...

DanielBell99

Chunked tokenization in huggingface has an arrow error

I'm following the code from this video at 1m25s, which shows: def tokenize_and_chunk(texts): return tokenizer( texts["text"], truncation=True, max_length=context_length, return ove...

Mittenchops

python

pyarrow

apache-arrow

huggingface-tokenizers

huggingface-datasets

Votes: 0

Answers: 1

Latest Answer

Your Dataset is containing not only "text" column but "id" column as well. Remove column "id" and run map function - all works now. Also, in Your video tutorial. If You l...

Domarm

Posts

Questions

Blogs

Jobs

Questions about huggingface-tokenizers

Read more about huggingface-tokenizers