1 year ago

#70312

test-img

Puchatek Kubuś

GCE VM Instance utilizing only 30% CPU and training of NN runs 3 times slower than on my Notebook

I have free trial of GCE for next few months so I wanted to train my first DL model on it. I chose E2-highmem-8 standard that is: 8vCPU and 64GB of memory.

Now when I am trying to run my NN alorithm on this instance one epoch takes 12 seconds to complete, but on my pc it takes only 3-4 seconds. So basically I have to wait 3-4 times longer for single training to complete.

I have tried using:

tf.compat.v1.ConfigProto(device_count={"CPU": 8},
                            inter_op_parallelism_threads=1,
                            intra_op_parallelism_threads=16,
                            )

and

sess = tf.compat.v1.Session(tf.compat.v1.ConfigProto(
                              inter_op_parallelism_threads=1))

But both didn't worked.

Since I am the begginer, both regarding DL programming and using GCE, I am not sure if I used correct settings for VM to utilze 100% of CPUs.

Since I have no idea if I wrongly used tf settings or my code is bad I am putting in my code below. But firstly I will introduce to you why I've used loops over NN code. Due to the fact that I am the begginer and I've had a hard time with tunning of hyperparameters, so I've put NN code in three loops where each loop changes number of neurons, lr, and epochs, that is why the computing time is kind of important.

I am adding my code below, because I don't know if it's something in there that's causing so slow computing.

code:

import sys
import os
import sklearn
import math
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
from datetime import datetime
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from tensorflow import keras
from contextlib import contextmanager
np.set_printoptions(linewidth=3000)
import joblib


@contextmanager
def show_complete_array():
    oldoptions = np.get_printoptions()
    np.set_printoptions(threshold=np.inf)
    try:
        yield
    finally:
        np.set_printoptions(**oldoptions)

def split_dataset(data, output_len):
    input_len = len(data)
    samples = []
    for i in range(0, input_len, output_len):
        sample = data[i : i + input_len, :]
        samples.append(sample)
    return samples


n_steps = 60
def plot_series(series, y=None, y_pred=None, x_label="$t$", y_label="$x(t)$"):
    plt.plot(series, ".-")
    if y is not None:
        plt.plot(n_steps, y, "bx", markersize=10)
    if y_pred is not None:
        plt.plot(n_steps, y_pred, "ro")
    plt.grid(True)
    if x_label:
        plt.xlabel(x_label, fontsize=16)
    if y_label:
        plt.ylabel(y_label, fontsize=16, rotation=0)
    plt.hlines(0, 0, 100, linewidth=1)
    mini = min(series) - 0.1*min(series)
    maxi = max(series) + 0.1*max(series)
    plt.axis([0, n_steps + 1, mini, maxi])


FILE = 'TVCSILVER60.csv'
FOLDER = 'Data SLV'
PROJECT_ROOT_DIR = '.'
csv_path = os.path.join(PROJECT_ROOT_DIR, FOLDER, FILE)
print(csv_path)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(PROJECT_ROOT_DIR, fig_id + "." + fig_extension)
    print("Zapisywanie rysunku", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

Silver_CFD = pd.read_csv(csv_path, delimiter=',')
Silver_CFD.rename(columns={'Date' : 'Date', 'O' : 'Open', 'H' : 'High', 'L' : 'Low',
                           'C' : 'Close', 'Volume' : 'Volume', 'Volume MA' : 'Volume MA'}, inplace=True)

# Przygotowanie danych do lasu losowego
Silver_CFD_prepared = Silver_CFD.drop('Date', axis=1)
Silver_CFD_prepared = Silver_CFD_prepared.drop('High', axis=1)
Silver_CFD_prepared = Silver_CFD_prepared.drop('Low', axis=1)
Silver_CFD_prepared = Silver_CFD_prepared.drop('Open', axis=1)
Silver_CFD_prepared = Silver_CFD_prepared.drop('Volume', axis=1)
Silver_CFD_prepared = Silver_CFD_prepared.drop('Volume MA', axis=1)

#Silver_CFD_prepared_High = Silver_CFD_prepared['High']
#Silver_CFD_prepared_Low = Silver_CFD_prepared['Low']
#Silver_CFD_prepared_Open = Silver_CFD_prepared['Open']
#Silver_CFD_prepared_Close = Silver_CFD_prepared['Close']

scaler = MinMaxScaler(feature_range=(0.1,0.9), copy=False)

Silver_CFD_prepared = Silver_CFD_prepared.to_numpy(dtype='float32')

data_to_plot = Silver_CFD_prepared

Silver_CFD_prepared = scaler.fit_transform(Silver_CFD_prepared)

joblib.dump(scaler, 'scaler.joblib')



data_train = Silver_CFD_prepared[:12530]
#data_train = data_train.to_numpy(dtype='float32')
data_valid = Silver_CFD_prepared[:12530]
#data_valid = data_valid.to_numpy(dtype='float32')
data_test = Silver_CFD_prepared[:12530]
#data_test = data_test.to_numpy(dtype='float32')
data_to_operate = Silver_CFD_prepared

to_y = Silver_CFD_prepared.copy()

print(data_train.shape)


x_train = np.empty((12470, 60, 1))
y_train = np.empty((12470, 20))
index = 0
to_file = np.empty((12470))
for i in range(12470):
    for j in range(60):
        x_train[i, j, 0] = data_train[index]
        index += 1


    for k in range(20):
        y_train[i, k] = to_y[i + 60 + k]
    index = 1
    index += i

print(x_train.shape)
print(y_train.shape)


x_valid = np.empty((12470, 60, 1))
y_valid = np.empty((12470, 20))
index = 0
for i in range(12470):
    for j in range(60):
        x_valid[i, j, 0] = data_valid[index]
        index += 1
    for k in range(20):
        y_valid[i, k] = to_y[i + 60 + k]
    index = 1
    index += i



x_test = np.empty((12470, 60, 1))
y_test = np.empty((12470, 20))
index = 0
for i in range(12470):
    for j in range(60):
        x_test[i, j, 0] = data_test[index]
        index += 1
    for k in range(20):
        y_test[i, k] = to_y[i + 60 + k]
    index = 1
    index += i


print('x_train.shape', x_train.shape, 'x_valid.shape', x_valid.shape, 'x_test.shape', x_test.shape)
print('y_train.shape', y_train.shape, 'y_valid.shape', y_valid.shape, 'y_test.shape', y_test.shape)


#x_train = x_train[:, :-1, :]
#x_valid = x_valid[:, :-1, :]
#x_test = x_test[:, :-1, :]
#print('x_train.shape', x_train.shape, 'x_valid.shape', x_valid.shape, 'x_test.shape', x_test.shape)

'''
# prognozowanie naiwne
y_naive_pred = x_valid[:, -1]
naive_mse = np.mean(keras.losses.mean_squared_error(y_valid, y_naive_pred))
print(naive_mse)
plot_series(x_valid[0, :, 0], y_valid[0, 0], y_naive_pred[0, 0])
plt.show()
'''

n_neurons = 16
epochs = 50
lr = 0.0000
for trial in range(12):
    model = keras.models.Sequential([
        keras.layers.SimpleRNN(n_neurons, activation='relu', return_sequences=True, input_shape=[None, 1]),
        keras.layers.Dropout(0.2),
        keras.layers.SimpleRNN(n_neurons),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(20, activation='linear')
    ])
    for iters in range(50):
        lr = 0
        for rates in range(10):
            lr = lr + 0.0005
            optimizer = keras.optimizers.Adam(lr=lr)
            model.compile(loss='mse', optimizer=optimizer)
            history = model.fit(x_train, y_train, epochs=epochs,
                                validation_data=(x_valid, y_valid))


            train_predict = model.predict(x_train)
            valid_predict = model.predict(x_valid)
            test_predict = model.predict(x_test)
            print(test_predict.shape)

            w1, w2 = y_test.shape
            for i in range(w2-1):
                train_score = math.sqrt(mean_squared_error(y_train[:, i], train_predict[:, i]))
                valid_score = math.sqrt(mean_squared_error(y_valid[:, i], valid_predict[:, i]))
                test_score = math.sqrt(mean_squared_error(y_test[:, i], test_predict[:, i]))
                print('Train Score: %3f RMSE' % (train_score))
                print('Valid Score: %3f RMSE' % (valid_score))
                print('Test Score: %3f RMSE' % (test_score))

            sc = joblib.load('scaler.joblib')

            print(x_test.shape)
            print(y_test.shape)
            print(test_predict.shape)


            y_test_temp = y_test.copy()
            x_test_temp = x_test.copy()
            x_test_to_plot = x_test_temp.reshape(12470, 60) #[-1, :, :]
            y_test_to_plot = y_test_temp # [-1, :]
            test_predict_to_plot = test_predict #[-1, :]

            print(x_test_to_plot.shape)
            print(y_test_to_plot.shape)
            print(test_predict_to_plot.shape)

            x_test_inv = sc.inverse_transform(x_test_to_plot)
            y_test_inv = sc.inverse_transform(y_test_to_plot)
            test_predict_inv = sc.inverse_transform(test_predict_to_plot)

            print(x_test_inv.shape)
            print(y_test_inv.shape)
            print(test_predict_inv.shape)

            n_steps_to_plot = np.empty((20))
            for i in range(20):
                n_steps_to_plot[i] = n_steps + i
            plt.subplot(1, 2, 1)
            plt.plot(history.history['loss'], label='loss')
            plt.plot(history.history['val_loss'], label='val_loss')
            plt.legend()
            plt.xlabel('Epochs')
            plt.ylabel('loss function value')
            plt.grid()

            plt.subplot(1, 2, 2)
            plt.plot(x_test_inv[-1, :], label='input prices')
            plt.plot(n_steps_to_plot, y_test_inv[-1, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[-1, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()
            save_fig(f'loss_and_prediction_n_neurons_{n_neurons}_epochs_{epochs}_lr_{lr}')
            plt.close()



            plt.subplot(3, 3, 1)
            plt.plot(n_steps_to_plot, y_test_inv[35, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[35, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 2)
            plt.plot(n_steps_to_plot, y_test_inv[87, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[87, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 3)
            plt.plot(n_steps_to_plot, y_test_inv[457, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[457, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 4)
            plt.plot(n_steps_to_plot, y_test_inv[990, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[990, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 5)
            plt.plot(n_steps_to_plot, y_test_inv[3524, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[3524, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 6)
            plt.plot(n_steps_to_plot, y_test_inv[7896, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[7896, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 7)
            plt.plot(n_steps_to_plot, y_test_inv[12422, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[1422, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 8)
            plt.plot(n_steps_to_plot, y_test_inv[-1, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[-1, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()

            plt.subplot(3, 3, 9)
            plt.plot(n_steps_to_plot, y_test_inv[9544, :], 'b', label='real price')
            plt.plot(n_steps_to_plot, test_predict_inv[9554, :], 'r', label='predicted price')
            plt.grid()
            plt.xlabel('time')
            plt.ylabel('Price')
            plt.legend()
            save_fig(f'predictions_neurons_{n_neurons}_epochs_{epochs}_lr_{lr}')
            plt.close()

        epochs = epochs + 50
    n_neurons = n_neurons + 16
    epochs = 50

EDIT: I've created the second VM on which I use the same script as above except this one doesnt't have the loop in it, but the utilization haven't changed and still is capped at 30%. On both VMs I've checked for zombie processes and there were none to be found. However both machines have a lot of sleeping processes. for example: VM2 - the instance which runs the script w/o loops, has total of 120 processes, 1 running, 119 sleeping,

VM3 - the instance which runs the script with loops, has total of 117 processes, 1 running 116 sleeping.

but the free cpu for each instance is still around 70%(id value) and 7%(sy value) is used by other processes - displayed by top command

EDIT2: I've changed VM2 to c2d-highmem-8 standard with AMD milan CPUs. The time for one epoch to complete is 6 seconds now. Utilization rised up to 45% and is stuck there.

(I don't know why utilization growed up ¯_( ཀ ʖ̯ ཀ)_/¯)

I've also tried adding this code:

from keras import backend as K

import tensorflow as tf

config = tf.ConfigProto(intra_op_parallelism_threads=8, inter_op_parallelism_threads=2, allow_soft_placement=True, device_count = {'CPU': 8 })

session = tf.Session(config=config)

K.set_session(session)

os.environ["OMP_NUM_THREADS"] = "8"

os.environ["KMP_BLOCKTIME"] = "30"

os.environ["KMP_SETTINGS"] = "1"

os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"

But the computing time of one epoch didn't change, when I've changed digit '8' (that represents number of cores) to '4' nothing changed as well.

I've searched through every site on 1 google page with multiple keywords. I don't know what else I can do.

Maybe someone will know how to deal with this problem...

To be honest since one epoch takes avg 4 seconds for my pc I was hoping for 1 second or below for one epoch on VM instance.

python

tensorflow

virtual-machine

google-compute-engine

0 Answers

Your Answer

Accepted video resources