2 years ago

#58325

test-img

Paula

How to split dataset (that's been already fromatted into vowpal wabbit input format ) into train and test set?

I'm doing a program which predicts rating of a movie based on its title, year, director, actor and budget. I've web-scraped info about 990 movies from a page and decided to try vowpal wabbit. I've already formatted the input as VW requires but I don't know how to split the data into train and test data. Here's how I formatted the data from .csv to data.vw:

with open('dataWeight.csv', 'r') as file:
    filmweb = pd.read_csv("dataWeight.csv", header=0)

X = filmweb.drop('rating_10', axis=1)
y = filmweb.rating_10

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=42)


converter = DFtoVW(df=filmweb,
                   label=SimpleLabel(label="rating_10", weight="id"),
                   namespaces=[
                       Namespace(features=[Feature(col) for col in ["year", "budget"]], name="i"),
                       Namespace(features=[Feature(col) for col in ["title", "director", "actor"]], name="c")
                   ])

examples = converter.convert_df()

and here how it looks: screenshot of a data.vw

enter image description here

But what do I do now? Should I split data before formatting it?

python

machine-learning

text-classification

vowpalwabbit

0 Answers

Your Answer

Accepted video resources