A question regarding the included features on the train and test datasets #284

iladakis · 2023-11-07T19:08:07Z

iladakis
Nov 7, 2023

Hi everyone,

I am following the steps of the tutorial (published on this link: https://pytorch-tabular.readthedocs.io/en/latest/tutorials/01-Basic_Usage/) in order to train a model with my one datasets.

If I am not mistaken, during the train_test_split method there is no distinction between the target column and the features with which the model will be trained. I hope this distinction is applied on data_config dictionary provided to the TabularModel.
However, I was surprised to observe that the predict function is executed only if there is a target column provided (Class, labels or whatever). Does this mean that the classes are considered features during the training phase? What good does it make if I cannot provide unseen data without labels to get (as I expect) predictions (even though I may have the true values for validation)?

I am using an extended version of the following function to extract metrics:

def print_metrics(y_true, y_pred, tag):
if isinstance(y_true, pd.DataFrame) or isinstance(y_true, pd.Series):
y_true = y_true.values
if isinstance(y_pred, pd.DataFrame) or isinstance(y_pred, pd.Series):
y_pred = y_pred.values
if y_true.ndim>1:
y_true=y_true.ravel()
if y_pred.ndim>1:
y_pred=y_pred.ravel()
val_acc = accuracy_score(y_true, y_pred)
val_f1 = f1_score(y_true, y_pred)
print(f"{tag} Acc: {val_acc} | {tag} F1: {val_f1}")

My main question is: if y_true is necessarily part of the input to the model (as test['target']), isn't the predicted outcome biased? Is any internal distinction made in TabularModel? Is there any way to overcome the need to provide target column as well (as this isn't easy on real-time conditions and experiments)?

Thank you for your time! I am looking forward to your responses!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question regarding the included features on the train and test datasets #284

{{title}}

Replies: 0 comments

Select a reply

A question regarding the included features on the train and test datasets #284

iladakis Nov 7, 2023

Replies: 0 comments

iladakis
Nov 7, 2023