python (12.9k questions)
javascript (9.2k questions)
reactjs (4.7k questions)
java (4.2k questions)
java (4.2k questions)
c# (3.5k questions)
c# (3.5k questions)
html (3.3k questions)
Chunked tokenization in huggingface has an arrow error
I'm following the code from this video at 1m25s, which shows:
def tokenize_and_chunk(texts):
return tokenizer(
texts["text"], truncation=True, max_length=context_length,
return ove...
Mittenchops
Votes: 0
Answers: 1
build siamese network via huggingface --- tokenize two sentences respectively using huggingface datasets and transformers along with tensorflow
I'm currently building a siamese network with a pretrained Bert model which takes 'input_ids', 'token_type_ids' and 'attention_mask' as inputs from transformers.
I've got a dataset structured as quest...

Frank Bao
Votes: 0
Answers: 1
Loading Huggingface Dataset
I am attempting to load the 'wiki40b' dataset here, based on the instructions provided by Huggingface here. Because the file is potentially so large, I am attempting to load only a small subset of the...
cookie1986
Votes: 0
Answers: 0
HuggingFace Dataset - pyarrow.lib.ArrowMemoryError: realloc of size failed
I am trying to use Hugginface Datasets for speech recognition using transformers, where I have pairs of text/audio. I am creating a Dataframe without problem with these two lists:
d = pd.DataFrame.fro...
user1680859
Votes: 0
Answers: 1