2 years ago
#58325
Paula
How to split dataset (that's been already fromatted into vowpal wabbit input format ) into train and test set?
I'm doing a program which predicts rating of a movie based on its title, year, director, actor and budget. I've web-scraped info about 990 movies from a page and decided to try vowpal wabbit. I've already formatted the input as VW requires but I don't know how to split the data into train and test data. Here's how I formatted the data from .csv to data.vw:
with open('dataWeight.csv', 'r') as file:
filmweb = pd.read_csv("dataWeight.csv", header=0)
X = filmweb.drop('rating_10', axis=1)
y = filmweb.rating_10
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=42)
converter = DFtoVW(df=filmweb,
label=SimpleLabel(label="rating_10", weight="id"),
namespaces=[
Namespace(features=[Feature(col) for col in ["year", "budget"]], name="i"),
Namespace(features=[Feature(col) for col in ["title", "director", "actor"]], name="c")
])
examples = converter.convert_df()
and here how it looks: screenshot of a data.vw
But what do I do now? Should I split data before formatting it?
python
machine-learning
text-classification
vowpalwabbit
0 Answers
Your Answer