You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I am not mistaken, during the train_test_split method there is no distinction between the target column and the features with which the model will be trained. I hope this distinction is applied on data_config dictionary provided to the TabularModel.
However, I was surprised to observe that the predict function is executed only if there is a target column provided (Class, labels or whatever). Does this mean that the classes are considered features during the training phase? What good does it make if I cannot provide unseen data without labels to get (as I expect) predictions (even though I may have the true values for validation)?
I am using an extended version of the following function to extract metrics:
def print_metrics(y_true, y_pred, tag):
if isinstance(y_true, pd.DataFrame) or isinstance(y_true, pd.Series):
y_true = y_true.values
if isinstance(y_pred, pd.DataFrame) or isinstance(y_pred, pd.Series):
y_pred = y_pred.values
if y_true.ndim>1:
y_true=y_true.ravel()
if y_pred.ndim>1:
y_pred=y_pred.ravel()
val_acc = accuracy_score(y_true, y_pred)
val_f1 = f1_score(y_true, y_pred)
print(f"{tag} Acc: {val_acc} | {tag} F1: {val_f1}")
My main question is: if y_true is necessarily part of the input to the model (as test['target']), isn't the predicted outcome biased? Is any internal distinction made in TabularModel? Is there any way to overcome the need to provide target column as well (as this isn't easy on real-time conditions and experiments)?
Thank you for your time! I am looking forward to your responses!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi everyone,
I am following the steps of the tutorial (published on this link: https://pytorch-tabular.readthedocs.io/en/latest/tutorials/01-Basic_Usage/) in order to train a model with my one datasets.
If I am not mistaken, during the train_test_split method there is no distinction between the target column and the features with which the model will be trained. I hope this distinction is applied on data_config dictionary provided to the TabularModel.
However, I was surprised to observe that the predict function is executed only if there is a target column provided (Class, labels or whatever). Does this mean that the classes are considered features during the training phase? What good does it make if I cannot provide unseen data without labels to get (as I expect) predictions (even though I may have the true values for validation)?
I am using an extended version of the following function to extract metrics:
def print_metrics(y_true, y_pred, tag):
if isinstance(y_true, pd.DataFrame) or isinstance(y_true, pd.Series):
y_true = y_true.values
if isinstance(y_pred, pd.DataFrame) or isinstance(y_pred, pd.Series):
y_pred = y_pred.values
if y_true.ndim>1:
y_true=y_true.ravel()
if y_pred.ndim>1:
y_pred=y_pred.ravel()
val_acc = accuracy_score(y_true, y_pred)
val_f1 = f1_score(y_true, y_pred)
print(f"{tag} Acc: {val_acc} | {tag} F1: {val_f1}")
My main question is: if y_true is necessarily part of the input to the model (as test['target']), isn't the predicted outcome biased? Is any internal distinction made in TabularModel? Is there any way to overcome the need to provide target column as well (as this isn't easy on real-time conditions and experiments)?
Thank you for your time! I am looking forward to your responses!
Beta Was this translation helpful? Give feedback.
All reactions