1 year ago

#75457

test-img

javier.desario

Multi Task Learning with different sample size in each pair input/output

I am implementing a MTL solution for a regression model in an already known benchmarking dataset for this kind of applications (School Dataset from Manash).

I could efficiently train the model using 3 inputs with a different sample size each. More specifically I have 2 datasets with shapes (91, 28) and 1 with shape (212,28), and each one has their own labels with shapes ((91,1), (91,1) & (212,1)) respectively.

I split each dataset for training, validation and testing in the same proportions.

Using the Keras API I coded the following Network Architecture:

 Layer (type)                |   Output Shape     |    Param #   |  Connected to                     
==========================================================================================
 school_1_in (InputLayer)       [(None, 28)]         0           []                               
                                                                                                  
 school_2_in (InputLayer)       [(None, 28)]         0           []                               
                                                                                                  
 school_3_in (InputLayer)       [(None, 28)]         0           []                               
                                                                                                  
 concatenate_10 (Concatenate)   (None, 84)           0           ['school_1_in[0][0]',            
                                                                  'school_2_in[0][0]',            
                                                                  'school_3_in[0][0]']            
                                                                                                  
 dense_41 (Dense)               (None, 16)           1360        ['concatenate_10[0][0]']         
                                                                                                  
 dense_42 (Dense)               (None, 8)            136         ['dense_41[0][0]']               
                                                                                                  
 dense_43 (Dense)               (None, 4)            36          ['dense_42[0][0]']               
                                                                                                  
 dense_44 (Dense)               (None, 4)            36          ['dense_42[0][0]']               
                                                                                                  
 dense_45 (Dense)               (None, 4)            36          ['dense_42[0][0]']               
                                                                                                  
 school_1_out (Dense)           (None, 1)            5           ['dense_43[0][0]']               
                                                                                                  
 school_2_out (Dense)           (None, 1)            5           ['dense_44[0][0]']               
                                                                                                  
 school_3_out (Dense)           (None, 1)            5           ['dense_45[0][0]']               
                                                                                                  
==================================================================================================
Total params: 1,619
Trainable params: 1,619
Non-trainable params: 0

There are the 3 Input Layers for each train split from the datasets, followed by 1 Concatenation and 2 Dense Shared Layers for learning a feature representation of the whole input combined, then I use 3 Task-specific Dense Layers for each output as to learn higher level representations.

here is the code to the model:

(I saved each train_input and train_output on a dict just for simplicity)

# Modelling - Keras Functional API

input_tensor_1 = Input(shape=(train_inputs[0].shape[1],), dtype='int32', name='school_1_in')
input_tensor_2 = Input(shape=(train_inputs[1].shape[1],), dtype='int32', name='school_2_in')
input_tensor_3 = Input(shape=(train_inputs[2].shape[1],), dtype='int32', name='school_3_in')

concatenated = layers.concatenate([input_tensor_1, input_tensor_2, input_tensor_3],
                                   axis=-1) 

shared_layer_1 = layers.Dense(16, activation='relu')(concatenated)
shared_layer_2 = layers.Dense(8, activation='relu')(shared_layer_1)

hidden_1 = layers.Dense(4, activation='relu')(shared_layer_2)
hidden_2 = layers.Dense(4, activation='relu')(shared_layer_2)
hidden_3 = layers.Dense(4, activation='relu')(shared_layer_2)

output_1 = layers.Dense(1, name='school_1_out')(hidden_1)
output_2 = layers.Dense(1, name='school_2_out')(hidden_2)
output_3 = layers.Dense(1, name='school_3_out')(hidden_3)

model = models.Model([input_tensor_1, input_tensor_2, input_tensor_3], 
                     [output_1, output_2, output_3])

model.compile(optimizer='adam', 
              loss={
                  'school_1_out': 'mse',
                  'school_2_out': 'mse',
                  'school_3_out': 'mse'
                   },
              metrics=['mae']
             )

epochs = 300

model.fit({'school_1_in': train_inputs[0], 'school_2_in': train_inputs[1], 'school_3_in': train_inputs[2]}, 
          {'school_1_out': train_outputs[0], 'school_2_out': train_outputs[1], 'school_3_out': train_outputs[2]},
          epochs=epochs,
          batch_size=32,
          validation_split=0.2,
          verbose=1)

history_dict = model.history.history
model.summary()

This code runs successfully. Then I try to evaluate it on the test data for each one and I get the following error:

model.evaluate([test_inputs[0], test_inputs[1], test_inputs[2]], 
               [test_outputs[0], test_outputs[1], test_outputs[2]])
ValueError: Data cardinality is ambiguous:
  x sizes: 19, 19, 43
  y sizes: 19, 19, 43
Make sure all arrays contain the same number of samples.

I understand the error in any context other than the one I am trying to apply here, since the core idea is that the number of samples is different. I read many papers referring to MTL on these approaches but could not get any code example to implement it correctly.

My question is, how could I evaluate the model on the testing data using these inputs with obviously different number of samples? And therefore how is the model even trained in the first place? I understand that the forward pass for the back-propagation algorithm needs a 3 input/3 output pair at every time to activate every neuron of the shared layers and calculate the loss for each output in a given batch, but in this case the bigger input should be trained with missing data on the other two.

I appreciate any help regarding this issue, I think maybe the whole implementation is ill-defined but having read so many papers allowing this difference in the number of samples to happen, I assume I may be making a silly mistake.

Thank you!!!!!!!!!

python

numpy

tensorflow

keras

deep-learning

0 Answers

Your Answer

Accepted video resources