1 year ago
#75457
javier.desario
Multi Task Learning with different sample size in each pair input/output
I am implementing a MTL solution for a regression model in an already known benchmarking dataset for this kind of applications (School Dataset from Manash).
I could efficiently train the model using 3 inputs with a different sample size each. More specifically I have 2 datasets with shapes (91, 28) and 1 with shape (212,28), and each one has their own labels with shapes ((91,1), (91,1) & (212,1)) respectively.
I split each dataset for training, validation and testing in the same proportions.
Using the Keras API I coded the following Network Architecture:
Layer (type) | Output Shape | Param # | Connected to
==========================================================================================
school_1_in (InputLayer) [(None, 28)] 0 []
school_2_in (InputLayer) [(None, 28)] 0 []
school_3_in (InputLayer) [(None, 28)] 0 []
concatenate_10 (Concatenate) (None, 84) 0 ['school_1_in[0][0]',
'school_2_in[0][0]',
'school_3_in[0][0]']
dense_41 (Dense) (None, 16) 1360 ['concatenate_10[0][0]']
dense_42 (Dense) (None, 8) 136 ['dense_41[0][0]']
dense_43 (Dense) (None, 4) 36 ['dense_42[0][0]']
dense_44 (Dense) (None, 4) 36 ['dense_42[0][0]']
dense_45 (Dense) (None, 4) 36 ['dense_42[0][0]']
school_1_out (Dense) (None, 1) 5 ['dense_43[0][0]']
school_2_out (Dense) (None, 1) 5 ['dense_44[0][0]']
school_3_out (Dense) (None, 1) 5 ['dense_45[0][0]']
==================================================================================================
Total params: 1,619
Trainable params: 1,619
Non-trainable params: 0
There are the 3 Input Layers for each train split from the datasets, followed by 1 Concatenation and 2 Dense Shared Layers for learning a feature representation of the whole input combined, then I use 3 Task-specific Dense Layers for each output as to learn higher level representations.
here is the code to the model:
(I saved each train_input and train_output on a dict just for simplicity)
# Modelling - Keras Functional API
input_tensor_1 = Input(shape=(train_inputs[0].shape[1],), dtype='int32', name='school_1_in')
input_tensor_2 = Input(shape=(train_inputs[1].shape[1],), dtype='int32', name='school_2_in')
input_tensor_3 = Input(shape=(train_inputs[2].shape[1],), dtype='int32', name='school_3_in')
concatenated = layers.concatenate([input_tensor_1, input_tensor_2, input_tensor_3],
axis=-1)
shared_layer_1 = layers.Dense(16, activation='relu')(concatenated)
shared_layer_2 = layers.Dense(8, activation='relu')(shared_layer_1)
hidden_1 = layers.Dense(4, activation='relu')(shared_layer_2)
hidden_2 = layers.Dense(4, activation='relu')(shared_layer_2)
hidden_3 = layers.Dense(4, activation='relu')(shared_layer_2)
output_1 = layers.Dense(1, name='school_1_out')(hidden_1)
output_2 = layers.Dense(1, name='school_2_out')(hidden_2)
output_3 = layers.Dense(1, name='school_3_out')(hidden_3)
model = models.Model([input_tensor_1, input_tensor_2, input_tensor_3],
[output_1, output_2, output_3])
model.compile(optimizer='adam',
loss={
'school_1_out': 'mse',
'school_2_out': 'mse',
'school_3_out': 'mse'
},
metrics=['mae']
)
epochs = 300
model.fit({'school_1_in': train_inputs[0], 'school_2_in': train_inputs[1], 'school_3_in': train_inputs[2]},
{'school_1_out': train_outputs[0], 'school_2_out': train_outputs[1], 'school_3_out': train_outputs[2]},
epochs=epochs,
batch_size=32,
validation_split=0.2,
verbose=1)
history_dict = model.history.history
model.summary()
This code runs successfully. Then I try to evaluate it on the test data for each one and I get the following error:
model.evaluate([test_inputs[0], test_inputs[1], test_inputs[2]],
[test_outputs[0], test_outputs[1], test_outputs[2]])
ValueError: Data cardinality is ambiguous:
x sizes: 19, 19, 43
y sizes: 19, 19, 43
Make sure all arrays contain the same number of samples.
I understand the error in any context other than the one I am trying to apply here, since the core idea is that the number of samples is different. I read many papers referring to MTL on these approaches but could not get any code example to implement it correctly.
My question is, how could I evaluate the model on the testing data using these inputs with obviously different number of samples? And therefore how is the model even trained in the first place? I understand that the forward pass for the back-propagation algorithm needs a 3 input/3 output pair at every time to activate every neuron of the shared layers and calculate the loss for each output in a given batch, but in this case the bigger input should be trained with missing data on the other two.
I appreciate any help regarding this issue, I think maybe the whole implementation is ill-defined but having read so many papers allowing this difference in the number of samples to happen, I assume I may be making a silly mistake.
Thank you!!!!!!!!!
python
numpy
tensorflow
keras
deep-learning
0 Answers
Your Answer