How to split dataset (that's been already fromatted into vowpal wa - Enhance your coding expertise with Paula on @onlycoders.net

2 years ago

#58325

Paula

How to split dataset (that's been already fromatted into vowpal wabbit input format ) into train and test set?

I'm doing a program which predicts rating of a movie based on its title, year, director, actor and budget. I've web-scraped info about 990 movies from a page and decided to try vowpal wabbit. I've already formatted the input as VW requires but I don't know how to split the data into train and test data. Here's how I formatted the data from .csv to data.vw:

with open('dataWeight.csv', 'r') as file:
    filmweb = pd.read_csv("dataWeight.csv", header=0)

X = filmweb.drop('rating_10', axis=1)
y = filmweb.rating_10

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=42)


converter = DFtoVW(df=filmweb,
                   label=SimpleLabel(label="rating_10", weight="id"),
                   namespaces=[
                       Namespace(features=[Feature(col) for col in ["year", "budget"]], name="i"),
                       Namespace(features=[Feature(col) for col in ["title", "director", "actor"]], name="c")
                   ])

examples = converter.convert_df()

and here how it looks: screenshot of a data.vw

But what do I do now? Should I split data before formatting it?

python

machine-learning

text-classification

vowpalwabbit

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs

How to split dataset (that&#39;s been already fromatted into vowpal wabbit input format ) into train and test set?

How to split dataset (that's been already fromatted into vowpal wabbit input format ) into train and test set?