Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alpaca chat template #7383

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jukofyork
Copy link
Contributor

@jukofyork jukofyork commented May 19, 2024

This PR adds the commonly used 'alpaca' chat template.

I can't actually find any models on huggingface that have the 'alpaca' template in their tokenizer_config.json files so had to manually add it to the Templates-supported-by-llama_chat_apply_template script (edit: see lower for corrected wiki python script addition).

I used the Jinga template given by text-generation-webui:

https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Alpaca.yaml


I should add that even though this template isn't actually used much now, people still do use it for creative writing as a workaround for the problematic Mistral [INST] type templates, so having the ability to manually specify 'alpaca' for these models would help.

It is also subtly different to the existing deepseek-coder template: no'<|EOT|>' and also uses double newlines.

I also chose meta-math/MetaMath-7B-V1.0 fairly arbitrarily, but this template is used in lots of other commonly used models, such as: the wizard family of models, phind-codellama, etc.

@jukofyork
Copy link
Contributor Author

I'm also not sure if the Python script above should do something to deal with default system messages that end with a newline, eg:

----- deepseek-ai/deepseek-coder-33b-instruct -----
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|begin▁of▁sentence|>You are a helpful assistant### Instruction:
Hello
### Response:
Hi there
<|EOT|>
### Instruction:
Who are you
### Response:
   I am an assistant   
<|EOT|>
### Instruction:
Another question
### Response:

Doesn't look correct to me for deepseek-coder as the default system message ends with a newline like so:

----- deepseek-ai/deepseek-coder-33b-instruct -----
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|begin▁of▁sentence|>You are a helpful assistant
### Instruction:
Hello
### Response:
Hi there
<|EOT|>
### Instruction:
Who are you
### Response:
   I am an assistant   
<|EOT|>
### Instruction:
Another question
### Response:

Perhaps the Python script should detect this and move the newline from the default system message into the template itself?

@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

It is also subtly different to the existing deepseek-coder template as it uses '<\s>' instead of '<|EOT|>' and also uses double newlines instead of single. I added it at the end of the conditional tests, as will also trigger this:

else if (tmpl == "alpaca" || (tmpl.find("### Instruction:") != std::string::npos && tmpl.find("### Response:") != std::string::npos)) {

(I can't see anything that is exclusive to the 'alpaca' chat template vs the deepseek-coder template - other than perhaps including the "\n\n" in the test, but wasn't 100% sure this would get passed unescaped/unchanged in the c-string...).

Actually I think the safest way to ensure this is robust against refactoring the order of tests is:

} else if (tmpl == "alpaca" || (tmpl.find("### Instruction:") != std::string::npos && tmpl.find("<|EOT|>") == std::string::npos)) {

So have changed it to that.

Feel free to change this if you can think of a better test or if just else if (tmpl == "alpaca") would be OK considering there don't seem to actually be any real Jinga templates that use this anyway...

@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

I just searched some of the closed PRs and found somebody had almost added the 'alpaca' template before:

#6397

alpaca template and deepseek template both look similar at the first glance, but the main different is that alpaca template only used for instruction-response (one turn) and not multiple turns like modern chat template.

This isn't true as the (now deleted off hugginface) wizard family of models and phind-codellama (using the proper template) allow for multi-turn conversations.

Also, as I mentioned above; people are often using this for the Mistral models that use the "[INST]" template to improve their creative writing ability, etc.

@jukofyork
Copy link
Contributor Author

There's something not right about this as phind-codellama seems way dumber than it does in Ollama using what should be the same template... I'll see if I can find out what is wrong.

@jukofyork
Copy link
Contributor Author

It looks like there is an extra space getting added between the <s> and the ### Instruction:

"tid":"140679538630656","timestamp":1716128666,"level":"VERB","function":"update_slots","line":1955,"msg":"prompt tokenized","id_slot":0,"id_task":0,"n_ctx":16384,"n_keep":0,"n_prompt_tokens":44,"prompt_tokens":"<s> ### Instruction:\nCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values\n\n### Response:\n"}

@github-actions github-actions bot added the testing Everything test related label May 19, 2024
@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

So I've completely redone it all using an edited version of the deepseek-coder Jinga template and llama.cpp code and it still gets that random space between the <s> and the ### Instruction::

{"tid":"139645072551936","timestamp":1716131290,"level":"VERB","function":"update_slots","line":1955,"msg":"prompt tokenized","id_slot":0,"id_task":980,"n_ctx":16384,"n_keep":0,"n_prompt_tokens":1039,"prompt_tokens":"<s> ### Instruction:\nCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values\n\n### Response:\nSure, here's a simple C++ program that uses the GNU Scientific Library (GSL) to perform logistic regression. Note that you need to have GSL installed on your system to compile this code.\n\n```cpp\n#include <iostream>\n#include <gsl/gsl_multifit.h>\n\nvoid logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {\n    double a = gsl_vector_get(params, 0);\n    double b = gsl_vector_get(params, 1);\n\n    size_t n = x->size;\n\n    for (size_t i = 0; i < n; ++i) {\n        double xi = gsl_vector_get(x, i);\n        double fi = 1.0 / (1.0 + exp(-(a * xi + b)));\n        gsl_vector_set(f, i, fi);\n    }\n}\n\nint main() {\n    const size_t n = 5;\n    const size_t p = 2;\n\n    // Hardcoded values for demonstration\n    double x[] = {1, 2, 3, 4, 5};\n    double y[] = {0.1, 0.3, 0.7, 0.85, 0.99};\n\n    gsl_vector_view xv = gsl_vector_view_array(x, n);\n    gsl_vector * X = gsl_vector_alloc(n);\n    for (size_t i = 0; i < n; ++i) {\n        gsl_vector_set(X, i, 1.0);\n    }\n\n    gsl_matrix * X2 = gsl_matrix_alloc(n, p);\n    for (size_t i = 0; i < n; ++i) {\n        gsl_matrix_set(X2, i, 0, x[i]);\n        gsl_matrix_set(X2, i, 1, 1.0);\n    }\n\n    gsl_vector * yv = gsl_vector_view_array(y, n);\n\n    gsl_multifit_function_fdf f;\n    f.f = NULL;\n    f.df = NULL;\n    f.fdf = &logistic_regression;\n    f.n = n;\n    f.p = p;\n\n    gsl_vector * params = gsl_vector_alloc(p);\n\n    gsl_multifit_fdfsolver * s = gsl_multifit_fdfsolver_alloc(gsl_multifit_fdfsolver_lmsder, n, p);\n    gsl_multifit_fdfsolver_set(s, &f, params);\n\n    int status;\n    size_t iter = 0;\n\n    do {\n        iter++;\n        status = gsl_multifit_fdfsolver_iterate(s);\n\n        if (status) break;\n\n        status = gsl_multifit_test_delta(s->dx, s->x, 1e-8, 1e-8);\n    } while (status == GSL_CONTINUE && iter < 1000);\n\n    std::cout << \"a = \" << gsl_vector_get(params, 0) << \", b = \" << gsl_vector_get(params, 1) << std::endl;\n\n    gsl_multifit_fdfsolver_free(s);\n    gsl_vector_free(params);\n    gsl_matrix_free(X2);\n    gsl_vector_free(X);\n\n    return 0;\n}\n```\n\nThis program performs logistic regression on a set of hardcoded data points (x, y). The output will be the parameters 'a' and 'b' of the logistic function.\n\nPlease note that this is a basic example and may not be suitable for real-world applications without further modifications. For instance, you might want to add error handling or use more advanced optimization algorithms.</s>\n\n### Instruction:\nThanks!\n\n### Response:\n"}

You can see in the follow up instruction that it doesn't add a space around the </s> token though.

BUT: deepseek-coder has no space between its <|begin▁of▁sentence|> and ### Instruction::

{"tid":"139732685623296","timestamp":1716131583,"level":"VERB","function":"update_slots","line":1955,"msg":"prompt tokenized","id_slot":0,"id_task":0,"n_ctx":16384,"n_keep":0,"n_prompt_tokens":42,"prompt_tokens":"<|begin▁of▁sentence|>### Instruction:\nCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values\n### Response:\n"}

The code is almost the same:

    } else if (tmpl == "alpaca" || (tmpl.find("### Instruction:") != std::string::npos && tmpl.find("<|EOT|>") == std::string::npos)) {
        // deepseek-ai/deepseek-coder-33b-instruct
        for (auto message : chat) {
            std::string role(message->role);
            if (role == "system") {
                ss << message->content << "\n\n";
            } else if (role == "user") {
                ss << "### Instruction:\n" << message->content << "\n\n";
            } else if (role == "assistant") {
                ss << "### Response:\n" << message->content << "</s>\n\n";
            }
        }
        if (add_ass) {
            ss << "### Response:\n";
        }
    }
    } else if (tmpl == "deepseek" || (tmpl.find("### Instruction:") != std::string::npos && tmpl.find("<|EOT|>") != std::string::npos)) {
        // deepseek-ai/deepseek-coder-33b-instruct
        for (auto message : chat) {
            std::string role(message->role);
            if (role == "system") {
                ss << message->content;
            } else if (role == "user") {
                ss << "### Instruction:\n" << message->content << "\n";
            } else if (role == "assistant") {
                ss << "### Response:\n" << message->content << "\n<|EOT|>\n";
            }   
        }   
        if (add_ass) {
            ss << "### Response:\n";
        }   
    }

as is the test template:

        // deepseek-ai/deepseek-coder-33b-instruct
        "You are a helpful assistant### Instruction:\nHello\n### Response:\nHi there\n<|EOT|>\n### Instruction:\nWho are you\n### Response:\n   I am an assistant   \n<|EOT|>\n### Instruction:\nAnother question\n### Response:\n",
        // meta-math/MetaMath-7B-V1.0
        "You are a helpful assistant\n\n### Instruction:\nHello\n\n### Response:\nHi there</s>\n\n### Instruction:\nWho are you\n\n### Response:\n   I am an assistant   </s>\n\n### Instruction:\nAnother question\n\n### Response:\n",

and both have:

llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  24:               tokenizer.ggml.add_eos_token bool             = false

I just can't for the life of me see why one inserts a space after its BOS token and one doesn't???

Not much else I can do and it's clear this PR won't work as phind-codellama is basically lobotomised by that space...

@jukofyork
Copy link
Contributor Author

Re-quantizing phind-codellama using convert.py to see if that makes any change.

@jukofyork
Copy link
Contributor Author

Nope the space is still here:

{"tid":"140235030601728","timestamp":1716132802,"level":"VERB","function":"update_slots","line":1955,"msg":"prompt tokenized","id_slot":0,"id_task":0,"n_ctx":16384,"n_keep":0,"n_prompt_tokens":44,"prompt_tokens":"<s> ### Instruction:\nCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values\n\n### Response:\n"}

@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

Nope it's completely useless with that extra space.

I've even tried the "official" phind-codellama template using "### System Prompt", "### User Message" and "### Assistant", and it is just completely lobotomised:


USER

Can you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values

ASSISTANT

Sure, I can provide you with a simple C++ program that uses GSL (GNU Scientific Library) to perform logistic regression. Please note that you have to install GSL library on your system to run this program.

#include <iostream>
#include <gsl/gsl_multifit.h>

void logistic_regression(const gsl_vector *x, void *params, gsl_vector *y) {
    double alpha = gsl_vector_get(x, 0);
    double beta = gsl_vector_get(x, 1);

    for (size_t i = 0; i < y->size; i++) {
        double xi = gsl_vector_get(params, i);
        double val = 1.0 / (1.0 + exp(-(alpha * xi + beta)));
        gsl_vector_set(y, i, val);
    }
}

int main() {
    const size_t n = 5;
    const size_t p = 2;

    gsl_vector *x = gsl_vector_alloc(n);
    gsl_vector *y = gsl_vector_alloc(n);

    // Fill x and y with hardcoded values
    for (size_t i = 0; i < n; ++i) {
        gsl_vector_set(x, i, i);
        gsl_vector_set(y, i, 1.0 / (1.0 + exp(-(3.0 * i + 2))));
    }

    gsl_multifit_linear_workspace *work = gsl_multifit_linear_alloc(n, p);

    gsl_vector *c = gsl_vector_alloc(p);
    gsl_matrix *cov = gsl_matrix_alloc(p, p);

    gsl_multifit_linear(x, y, c, cov, &logistic_regression);

    std::cout << "alpha = " << gsl_vector_get(c, 0) << ", beta = " << gsl_vector_get(c, 1) << std::endl;

    gsl_multifit_linear_free(work);
    gsl_vector_free(x);
    gsl_vector_free(y);
    gsl_vector_free(c);
    gsl_matrix_free(cov);

    return 0;
}

This program calculates the logistic regression for a set of hardcoded values. The logistic_regression function is used to calculate the logistic regression model. In the main function, we allocate memory for x and y vectors, initialize them with hardcoded values, and then call gsl_multifit_linear to perform the fit. Finally, we print out the calculated alpha and beta values.

Please note that this is a very basic example and doesn't include any error checking or handling. For real-world applications, you should add proper error handling code.

USER

Can you implement the f.f and f.df code too?

ASSISTANT

Sure, I can help with that. Here's a Python implementation of the f.f and f.df functions for logistic regression:

import numpy as np
from scipy import special

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def logistic_regression(X, y, beta):
    """
    Compute the logistic regression function for given X and parameters beta.

    Args:
        X : array_like
            The input array of shape (n_samples, n_features)
        y : array_like
            The output array of shape (n_samples,)
        beta : array_like
            The parameters of the model of shape (n_features,)

    Returns:
        f : float
            The value of the logistic regression function at X with parameters beta.
    """
    return np.sum(np.log(sigmoid(np.dot(X, beta))))

def logistic_regression_gradient(X, y, beta):
    """
    Compute the gradient of the logistic regression function for given X and parameters beta.

    Args:
        X : array_like
            The input array of shape (n_samples, n_features)
        y : array_like
            The output array of shape (n_samples,)
        beta : array_like
            The parameters of the model of shape (n_features,)

    Returns:
        df : ndarray
            The gradient of the logistic regression function at X with parameters beta.
    """
    z = np.dot(X, beta)
    s = sigmoid(z)

    return np.dot(X.T, (s - y)) / y.size

# Testing
from sklearn import datasets

data = datasets.load_breast_cancer()

X = data.data
y = data.target

beta = np.random.randn(X.shape[1])

f = logistic_regression(X, y, beta)
df = logistic_regression_gradient(X, y, beta)

print("f:", f)
print("df:", df)

This code defines the logistic_regression function to compute the value of the logistic regression function at X with parameters beta and logistic_regression_gradient to compute its gradient. The gradient is used in optimization algorithms like Gradient Descent to find the minimum of the cost function.

Please note that this code assumes that y values are either 0 or 1, as it's a binary classification problem. If you have more than two classes, you need to modify the code accordingly.

USER

No, I mean add them to the C++ code you created earlier!

ASSISTANT

Sure, I can help with that. But I need to know what you have in your C++ code so far and where exactly you want to add these new functions. Please provide more details or share your current code.

@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

I'll leave this PR up as it may point to some more serious bug with the initial '<s>' tokenization, but as it stands there is no point in merging this if that space is going to get added as all the 'alpaca' models are likely going to be just as broken.

@jukofyork jukofyork changed the title Add alpaca chat template Add alpaca chat template (currently useless due to strange space getting added between <s> and ### Instruction:...) May 19, 2024
@jukofyork
Copy link
Contributor Author

Amazingly that space is so harmful to phind-codellama that using the deepseek chat template (with completely the wrong newlines and EOS token!!!) actually works better:


USER

Can you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values

ASSISTANT

Sure, here is a simple C++ program that uses the GNU Scientific Library (GSL) to perform logistic regression. Note that you'll need to have GSL installed on your system to compile this code.

#include <iostream>
#include <gsl/gsl_multifit.h>

void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(x, 0);
    double b = gsl_vector_get(x, 1);

    size_t i;
    for (i = 0; i < f->size; ++i) {
        double yi = gsl_vector_get((gsl_vector *)params, i);
        double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
        gsl_vector_set(f, i, fi);
    }
}

int main() {
    const size_t n = 5;
    const size_t p = 2;

    // independent variables (X) and dependent variable (y)
    gsl_vector * X = gsl_vector_alloc(n);
    gsl_vector * y = gsl_vector_alloc(n);

    // set values of X and y
    gsl_vector_set(X, 0, 1.0);
    gsl_vector_set(X, 1, 2.0);
    gsl_vector_set(X, 2, 3.0);
    gsl_vector_set(X, 3, 4.0);
    gsl_vector_set(X, 4, 5.0);

    gsl_vector_set(y, 0, -1.0);
    gsl_vector_set(y, 1,  1.0);
    gsl_vector_set(y, 2, -1.0);
    gsl_vector_set(y, 3,  1.0);
    gsl_vector_set(y, 4, -1.0);

    // allocate space for the parameters (a and b)
    gsl_vector * params = gsl_vector_alloc(p);

    // set initial guess for parameters
    gsl_vector_set(params, 0, 0.0);
    gsl_vector_set(params, 1, 0.0);

    // allocate space for the fitted parameters
    gsl_vector * f = gsl_vector_alloc(p);

    // create a linear system and solve it
    const gsl_multifit_fsolver_type *T;
    gsl_multifit_fsolver *s;

    T = gsl_multifit_fsolver_lmsder;
    s = gsl_multifit_fsolver_alloc(T, n, p);

    gsl_multifit_function f_;
    f_.f = logistic_regression;
    f_.params = y;

    gsl_multifit_fsolver_set(s, &f_, params);

    int status;
    size_t iter = 0;

    do {
        iter++;
        status = gsl_multifit_fsolver_iterate(s);

        if (status)
            break;

        status = gsl_multifit_test_delta(params, f, 1e-7, 1e-7);
    } while (status == GSL_CONTINUE && iter < 1000);

    // print the results
    printf("a = %g\n", gsl_vector_get(params, 0));
    printf("b = %g\n", gsl_vector_get(params, 1));

    // free memory
    gsl_multifit_fsolver_free(s);
    gsl_vector_free(X);
    gsl_vector_free(y);
    gsl_vector_free(params);
    gsl_vector_free(f);

    return 0;
}

This program calculates logistic regression for a simple dataset with two independent variables (X) and one dependent variable (y). The logistic_regression function is the model that we are fitting to the data. The main function sets up the problem, solves it using GSL's Levenberg-Marquardt algorithm, and prints the results.

USER

Can you implement the f.f and f.df code too?

ASSISTANT

Sure, here's the implementation of f.f and f.df functions:

void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(x, 0);
    double b = gsl_vector_get(x, 1);

    size_t i;
    for (i = 0; i < f->size; ++i) {
        double yi = gsl_vector_get((gsl_vector *)params, i);
        double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
        gsl_vector_set(f, i, fi);
    }
}

void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
    double a = gsl_vector_get(x, 0);
    double b = gsl_vector_get(x, 1);

    size_t i;
    for (i = 0; i < J->size1; ++i) {
        double yi = gsl_vector_get((gsl_vector *)params, i);
        double fi = 1.0 / (1.0 + exp(-(a * yi + b)));

        gsl_matrix_set(J, i, 0, yi * fi * (1 - fi));
        gsl_matrix_set(J, i, 1,      fi * (1 - fi));
    }
}

int main() {
    // ...

    f_.f = logistic_regression;
    f_.df = logistic_regression_df;

    // ...
}

In this code, logistic_regression is the model function, and logistic_regression_df is its Jacobian matrix. The main function remains unchanged.

USER

Please write out the full code so I can run it!

ASSISTANT

Here is the full code:

#include <iostream>
#include <gsl/gsl_multifit.h>

void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(x, 0);
    double b = gsl_vector_get(x, 1);

    size_t i;
    for (i = 0; i < f->size; ++i) {
        double yi = gsl_vector_get((gsl_vector *)params, i);
        double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
        gsl_vector_set(f, i, fi);
    }
}

void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
    double a = gsl_vector_get(x, 0);
    double b = gsl_vector_get(x, 1);

    size_t i;
    for (i = 0; i < J->size1; ++i) {
        double yi = gsl_vector_get((gsl_vector *)params, i);
        double fi = 1.0 / (1.0 + exp(-(a * yi + b)));

        gsl_matrix_set(J, i, 0, yi * fi * (1 - fi));
        gsl_matrix_set(J, i, 1,      fi * (1 - fi));
    }
}

int main() {
    const size_t n = 5;
    const size_t p = 2;

    // independent variables (X) and dependent variable (y)
    gsl_vector * X = gsl_vector_alloc(n);
    gsl_vector * y = gsl_vector_alloc(n);

    // set values of X and y
    gsl_vector_set(X, 0, 1.0);
    gsl_vector_set(X, 1, 2.0);
    gsl_vector_set(X, 2, 3.0);
    gsl_vector_set(X, 3, 4.0);
    gsl_vector_set(X, 4, 5.0);

    gsl_vector_set(y, 0, -1.0);
    gsl_vector_set(y, 1,  1.0);
    gsl_vector_set(y, 2, -1.0);
    gsl_vector_set(y, 3,  1.0);
    gsl_vector_set(y, 4, -1.0);

    // allocate space for the parameters (a and b)
    gsl_vector * params = gsl_vector_alloc(p);

    // set initial guess for parameters
    gsl_vector_set(params, 0, 0.0);
    gsl_vector_set(params, 1, 0.0);

    // allocate space for the fitted parameters
    gsl_vector * f = gsl_vector_alloc(p);

    // create a linear system and solve it
    const gsl_multifit_fsolver_type *T;
    gsl_multifit_fsolver *s;

    T = gsl_multifit_fsolver_lmsder;
    s = gsl_multifit_fsolver_alloc(T, n, p);

    gsl_multifit_function f_;
    f_.f = logistic_regression;
    f_.df = logistic_regression_df;
    f_.params = y;

    gsl_multifit_fsolver_set(s, &f_, params);

    int status;
    size_t iter = 0;

    do {
        iter++;
        status = gsl_multifit_fsolver_iterate(s);

        if (status)
            break;

        status = gsl_multifit_test_delta(params, f, 1e-7, 1e-7);
    } while (status == GSL_CONTINUE && iter < 1000);

    // print the results
    printf("a = %g\n", gsl_vector_get(params, 0));
    printf("b = %g\n", gsl_vector_get(params, 1));

    // free memory
    gsl_multifit_fsolver_free(s);
    gsl_vector_free(X);
    gsl_vector_free(y);
    gsl_vector_free(params);
    gsl_vector_free(f);

    return 0;
}

This code calculates logistic regression for a simple dataset with two independent variables (X) and one dependent variable (y). The logistic_regression function is the model that we are fitting to the data. The main function sets up the problem, solves it using GSL's Levenberg-Marquardt algorithm, and prints the results.


I think it would probably be a good idea to:

  1. Find exactly why that space is getting added for this but not for the deepseek-coder.
  2. Double check if any of the other chat templates that use '<s>' are having it added.

@jukofyork
Copy link
Contributor Author

I'm not even sure if it is the space that is causing it now, as this is the tokenization when using the deepseek chat template:

{"tid":"140097807843328","timestamp":1716135105,"level":"VERB","function":"update_slots","line":1955,"msg":"prompt tokenized","id_slot":0,"id_task":0,"n_ctx":16384,"n_keep":0,"n_prompt_tokens":43,"prompt_tokens":"<s> ### Instruction:\nCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values\n### Response:\n"}

that produces "sane" output from phind-codellama.

@jukofyork jukofyork changed the title Add alpaca chat template (currently useless due to strange space getting added between <s> and ### Instruction:...) Add alpaca chat template May 19, 2024
@jukofyork
Copy link
Contributor Author

I've figured it out now! The oobabooga/text-generation-webui/ template it actually correct:

https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Alpaca.yaml

and there is no </s> EOS to be added!

This now agrees with the Ollama-templated version for phind-codellama:


USER

Can you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values

ASSISTANT

Sure, here's a simple C++ program that uses the GNU Scientific Library (GSL) to perform logistic regression. Note that you need to have GSL installed on your system to compile this code.

#include <iostream>
#include <gsl/gsl_multifit.h>

void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(params, 0);
    double b = gsl_vector_get(params, 1);

    size_t n = x->size;

    for (size_t i = 0; i < n; ++i) {
        double xi = gsl_vector_get(x, i);
        double fi = 1.0 / (1.0 + exp(-(a * xi + b)));
        gsl_vector_set(f, i, fi);
    }
}

int main() {
    const size_t n = 5;
    const size_t p = 2;

    // Hardcoded values for demonstration
    double x[] = {1, 2, 3, 4, 5};
    double y[] = {0.1, 0.3, 0.7, 0.85, 0.99};

    gsl_vector_view xv = gsl_vector_view_array(x, n);
    gsl_vector * X = gsl_vector_alloc(n);
    for (size_t i = 0; i < n; ++i) {
        gsl_vector_set(X, i, 1.0);
    }

    gsl_matrix * X2 = gsl_matrix_alloc(n, p);
    for (size_t i = 0; i < n; ++i) {
        gsl_matrix_set(X2, i, 0, x[i]);
        gsl_matrix_set(X2, i, 1, 1.0);
    }

    gsl_vector * yv = gsl_vector_view_array(y, n);

    gsl_multifit_function_fdf f;
    f.f = NULL;
    f.df = NULL;
    f.fdf = &logistic_regression;
    f.n = n;
    f.p = p;

    gsl_vector * params = gsl_vector_alloc(p);

    gsl_multifit_fdfsolver * s = gsl_multifit_fdfsolver_alloc(gsl_multifit_fdfsolver_lmsder, n, p);
    gsl_multifit_fdfsolver_set(s, &f, params);

    int status;
    size_t iter = 0;

    do {
        iter++;
        status = gsl_multifit_fdfsolver_iterate(s);

        if (status) break;

        status = gsl_multifit_test_delta(s->dx, s->x, 1e-8, 1e-8);
    } while (status == GSL_CONTINUE && iter < 1000);

    std::cout << "a = " << gsl_vector_get(params, 0) << ", b = " << gsl_vector_get(params, 1) << std::endl;

    gsl_multifit_fdfsolver_free(s);
    gsl_vector_free(params);
    gsl_matrix_free(X2);
    gsl_vector_free(X);

    return 0;
}

This program performs logistic regression on a set of hardcoded data points (x, y). The output will be the parameters 'a' and 'b' of the logistic function.

Please note that this is a basic example and may not be suitable for real-world applications without further modifications. For instance, you might want to add error handling or use more advanced optimization algorithms.

USER

Can you implement the f.f and f.df code too?

ASSISTANT

Sure, here's the implementation of f.f and f.df:

void logistic_regression_f(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(params, 0);
    double b = gsl_vector_get(params, 1);

    size_t n = x->size;

    for (size_t i = 0; i < n; ++i) {
        double xi = gsl_vector_get(x, i);
        double fi = a * xi + b;
        gsl_vector_set(f, i, fi);
    }
}

void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
    size_t n = x->size;

    for (size_t i = 0; i < n; ++i) {
        double xi = gsl_vector_get(x, i);
        gsl_matrix_set(J, i, 0, xi);
        gsl_matrix_set(J, i, 1, 1.0);
    }
}

int main() {
    // ...

    f.f = &logistic_regression_f;
    f.df = &logistic_regression_df;

    // ...
}

These two functions calculate the model function a * x + b and its Jacobian matrix, respectively. The main function remains unchanged.

USER

Please write out the full code so I can run it!

ASSISTANT

Here's the full code with f.f and f.df implemented:

#include <iostream>
#include <gsl/gsl_multifit.h>

void logistic_regression_f(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(params, 0);
    double b = gsl_vector_get(params, 1);

    size_t n = x->size;

    for (size_t i = 0; i < n; ++i) {
        double xi = gsl_vector_get(x, i);
        double fi = a * xi + b;
        gsl_vector_set(f, i, fi);
    }
}

void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
    size_t n = x->size;

    for (size_t i = 0; i < n; ++i) {
        double xi = gsl_vector_get(x, i);
        gsl_matrix_set(J, i, 0, xi);
        gsl_matrix_set(J, i, 1, 1.0);
    }
}

void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
    double a = gsl_vector_get(params, 0);
    double b = gsl_vector_get(params, 1);

    size_t n = x->size;

    for (size_t i = 0; i < n; ++i) {
        double xi = gsl_vector_get(x, i);
        double fi = 1.0 / (1.0 + exp(-(a * xi + b)));
        gsl_vector_set(f, i, fi);
    }
}

int main() {
    const size_t n = 5;
    const size_t p = 2;

    // Hardcoded values for demonstration
    double x[] = {1, 2, 3, 4, 5};
    double y[] = {0.1, 0.3, 0.7, 0.85, 0.99};

    gsl_vector_view xv = gsl_vector_view_array(x, n);
    gsl_vector * X = gsl_vector_alloc(n);
    for (size_t i = 0; i < n; ++i) {
        gsl_vector_set(X, i, 1.0);
    }

    gsl_matrix * X2 = gsl_matrix_alloc(n, p);
    for (size_t i = 0; i < n; ++i) {
        gsl_matrix_set(X2, i, 0, x[i]);
        gsl_matrix_set(X2, i, 1, 1.0);
    }

    gsl_vector * yv = gsl_vector_view_array(y, n);

    gsl_multifit_function_fdf f;
    f.f = &logistic_regression_f;
    f.df = &logistic_regression_df;
    f.fdf = &logistic_regression;
    f.n = n;
    f.p = p;

    gsl_vector * params = gsl_vector_alloc(p);

    gsl_multifit_fdfsolver * s = gsl_multifit_fdfsolver_alloc(gsl_multifit_fdfsolver_lmsder, n, p);
    gsl_multifit_fdfsolver_set(s, &f, params);

    int status;
    size_t iter = 0;

    do {
        iter++;
        status = gsl_multifit_fdfsolver_iterate(s);

        if (status) break;

        status = gsl_multifit_test_delta(s->dx, s->x, 1e-8, 1e-8);
    } while (status == GSL_CONTINUE && iter < 1000);

    std::cout << "a = " << gsl_vector_get(params, 0) << ", b = " << gsl_vector_get(params, 1) << std::endl;

    gsl_multifit_fdfsolver_free(s);
    gsl_vector_free(params);
    gsl_matrix_free(X2);
    gsl_vector_free(X);

    return 0;
}

This code should compile and run on a system with GSL installed. The output will be the parameters 'a' and 'b' of the logistic function.

@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

This is the code that I used to generate the tests using the oobabooga/text-generation-webui template if the wiki needs to be updated:

from transformers import AutoTokenizer

VARIANTS_TO_TEST = [
    'deepseek-ai/deepseek-coder-33b-instruct',
    'meta-math/MetaMath-7B-V1.0',
]

HISTORY = [
    { 'role': 'system', 'content': 'You are a helpful assistant' },
    { 'role': 'user', 'content': 'Hello' },
    { 'role': 'assistant', 'content': 'Hi there' },
    { 'role': 'user', 'content': 'Who are you' },
    { 'role': 'assistant', 'content': '   I am an assistant   ' },
    { 'role': 'user', 'content': 'Another question' },
]

for variant in VARIANTS_TO_TEST:
    history = [m for m in HISTORY] # copy
    if 'meta-math' in variant:
        print("\n----- Alpaca -----")
        ALPACA_TMPL = "{%- set ns = namespace(found=false) -%} {%- for message in messages -%} {%- if message['role'] == 'system' -%} {%- set ns.found = true -%} {%- endif -%} {%- endfor -%} {%- if not ns.found -%} {{- '' + 'Below is an instruction that describes a task. Write a response that appropriately completes the request.' + '\n\n' -}} {%- endif %} {%- for message in messages %} {%- if message['role'] == 'system' -%} {{- '' + message['content'] + '\n\n' -}} {%- else -%} {%- if message['role'] == 'user' -%} {{-'### Instruction:\n' + message['content'] + '\n\n'-}} {%- else -%} {{-'### Response:\n' + message['content'] + '\n\n' -}} {%- endif -%} {%- endif -%} {%- endfor -%} {%- if add_generation_prompt -%} {{-'### Response:\n'-}} {%- endif -%}"
        output = AutoTokenizer.from_pretrained(VARIANTS_TO_TEST[1]).apply_chat_template(history, tokenize=False, add_generation_prompt=True, chat_template=ALPACA_TMPL)
        print(output)
        print("\n[Test String]\n// meta-math/MetaMath-7B-V1.0")
        print(output.replace("\n", "\\n"))
        print('"' + output.replace("\n", "\\n") + '",')
    else:
        print("\n----- " + variant + " -----")
        tokenizer = AutoTokenizer.from_pretrained(variant)
        output = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
        print(output)
        print("\n[Test String]\n// " + variant)
        print('"' + output.replace("\n", "\\n") + '",')

@jukofyork
Copy link
Contributor Author

jukofyork commented May 19, 2024

If anybody can find a copy of the old wizard coder models that also allowed for "multi-turn alpaca" it would be nice to test on them too, but there don't seem to be any mirrors on huggingface I can find...

Actually I've just found I have a copy of the 34b model! I will test and report back.

@jukofyork
Copy link
Contributor Author

Here is WizardCoder-Python-34B-V1.0 following multi-turn instructions fine, so I'm pretty sure this will work with any of the other multi-turn alpaca models too:


USER

Can you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values

ASSISTANT

Sure, here's an example program that calculates logistic regression using GSL:

#include <iostream>
#include <gsl/gsl_multifit.h>

int main() {
    // Define the input data
    double x[] = {1, 2, 3, 4, 5};
    double y[] = {0.1, 0.2, 0.3, 0.4, 0.5};

    // Define the parameters
    int n = sizeof(x) / sizeof(x[0]);
    gsl_vector *X = gsl_vector_alloc(n);
    gsl_vector *yv = gsl_vector_alloc(n);

    // Copy the data into GSL vectors
    for (int i = 0; i < n; ++i) {
        gsl_vector_set(X, i, x[i]);
        gsl_vector_set(yv, i, y[i]);
    }

    // Define the design matrix
    gsl_matrix *design = gsl_matrix_alloc(n, 2);
    for (int i = 0; i < n; ++i) {
        gsl_matrix_set(design, i, 0, 1);
        gsl_matrix_set(design, i, 1, x[i]);
    }

    // Define the coefficients
    gsl_vector *c = gsl_vector_alloc(2);

    // Solve for the coefficients
    int status = gsl_multifit_linear(X, yv, design, c);

    if (status) {
        std::cerr << "Error: " << gsl_strerror(status) << std::endl;
        return 1;
    }

    // Print the coefficients
    double a = gsl_vector_get(c, 0);
    double b = gsl_vector_get(c, 1);
    std::cout << "a: " << a << std::endl;
    std::cout << "b: " << b << std::endl;

    // Clean up
    gsl_vector_free(X);
    gsl_vector_free(yv);
    gsl_matrix_free(design);
    gsl_vector_free(c);

    return 0;
}

This program assumes that the input data is in the form of an array x and an array y, where each element of y corresponds to a value of the dependent variable for a given value of x. It creates a design matrix with two columns: one for the constant term (all 1's) and one for the input data. It then uses GSL's gsl_multifit_linear function to solve for the coefficients a and b that minimize the sum of squared errors between the predicted values and the actual values. The resulting coefficients are printed to standard output.

Note that this program assumes that you have already installed GSL on your system and linked against it in your compiler settings. If you haven't done this yet, you can find instructions for installing GSL here: https://www.gnu.org/software/gsl/doc/html/usage.html#compiling-and-linking

Also note that this program assumes a simple linear model with one independent variable (x). If you have more than one independent variable, you'll need to modify the design matrix accordingly.

USER

This is just linear regression though??? I wanted logistic regression!

ASSISTANT

You're right, this is just linear regression. Here's an example program that calculates logistic regression using GSL:

#include <iostream>
#include <gsl/gsl_multifit.h>

int main() {
    // Define the input data
    double x[] = {1, 2, 3, 4, 5};
    double y[] = {0.1, 0.2, 0.3, 0.4};

    // Define the parameters
    int n = sizeof(x) / sizeof(x[0]);
    gsl_vector *X = gsl_vector_alloc(n);
    gsl_vector *yv = gsl_vector_alloc(n);

    // Copy the data into GSL vectors
    for (int i = 0; i < n; ++i) {
        gsl_vector_set(X, i, x[i]);
        gsl_vector_set(yv, i, y[i]);
    }

    // Define the design matrix
    gsl_matrix *design = gsl_matrix_alloc(n, 2);
    for (int i = 0; i < n; ++i) {
        gsl_matrix_set(design, i, 0, 1);
        gsl_matrix_set(design, i, 1, x[i]);
    }

    // Define the coefficients
    gsl_vector *c = gsl_vector_alloc(2);

    // Solve for the coefficients
    int status = gsl_multifit_linear(X, yv, design, c);

    if (status) {
        std::cerr << "Error: " << gsl_strerror(status) << std::endl;
        return 1;
    }

    // Calculate the logistic function for each input value
    double a = gsl_vector_get(c, 0);
    double b = gsl_vector_get(c, 1);
    for (int i = 0; i < n; ++i) {
        double xi = gsl_vector_get(X, i);
        double yi = 1.0 / (1.0 + exp(-(a + b * xi));
        std::cout << "x: " << x[i] << ", y: " << y[i] << ", predicted: " << yi << std::endl;
    }

    // Clean up
    gsl_vector_free(X);
    gsl_vector_free(yv);
    gsl_matrix_free(design);
    gsl_vector_free(c);

    return 0;
}

This program calculates the coefficients a and b using linear regression, then applies the logistic function to each input value to get the predicted output. The predicted output is printed to standard output. Note that this program assumes a simple logistic model with one independent variable (x). If you have more than one independent variable, you'll need to modify the design matrix accordingly.

USER

No this is completely wrong... let's start again and make sure to use gsl_multifit_function_fdf

ASSISTANT

Sure, here's an example program that calculates logistic regression using GSL's gsl_multifit_function_fdf function:

#include <iostream>
#include <gsl/gsl_multifit.h>

int main() {
    // Define the input data
    double x[] = {1, 2, 3, 4, 5};
    double y[] = {0.1, 0.2, 0.3, 0.4};

    // Define the parameters
    int n = sizeof(x) / sizeof(x[0]);
    gsl_vector *X = gsl_vector_alloc(n);
    gsl_vector *yv = gsl_vector_alloc(n);

    // Copy the data into GSL vectors
    for (int i = 0; i < n; ++i) {
        gsl_vector_set(X, i, x[i]);
        gsl_vector_set(yv, i, y[i]);
    }

    // Define the design matrix
    gsl_matrix *design = gsl_matrix_alloc(n, 2);
    for (int i = 0; i < n; ++i) {
        gsl_matrix_set(design, i, 0, 1);
        gsl_matrix_set(design, i, 1, x[i]);
    }

    // Define the function and its derivatives
    struct {
        double (*f)(const gsl_vector *x, void *params);
        void (*df)(const gsl_vector *x, void *params, gsl_matrix *J);
        void (*fdf)(const gsl_vector *x, void *params, double *f, gsl_matrix *J);
    } F;

    F.f = [](const gsl_vector *c, void *) -> double {
        double a = gsl_vector_get(c, 0);
        double b = gsl_vector_get(c, 1);
        double sum = 0.0;

        for (int i = 0; i < n; ++i) {
            double xi = gsl_vector_get(X, i);
            double yi = gsl_vector_get(yv, i);
            sum += log(1.0 + exp(-yi * (a + b * xi));
        }

        return sum;
    };

    F.df = [](const gsl_vector *c, void *params, gsl_matrix *J) {
        double a = gsl_vector_get(c, 0);
        double b = gsl_vector_get(c, 1);

        for (int i = 0; i < n; ++i) {
            double xi = gsl_vector_get(X, i);
            double yi = gsl_vector_get(yv, i);
            double e = exp(-yi * (a + b * xi));

            gsl_matrix_set(J, i, 0, -yi * e / (1.0 + e));
            gsl_matrix_set(J, i, 1, -yi * xi * e / (1.0 + e));
        }
    };

    F.fdf = [](const gsl_vector *c, void *params, double *f, gsl_matrix *J) {
        F.f(c, params);
        F.df(c, params, J);
    };

    // Solve for the coefficients
    gsl_multifit_function f;
    f.n = n;
    f.p = 2;
    f.f = &F.f;
    f.df = &F.df;
    f.fdf = &F.fdf;

    gsl_multifit_function_fdf FDF;
    FDF.f = &F.f;
    FDF.df = &F.df;
    FDF.fdf = &F.fdf;

    gsl_vector *c = gsl_vector_alloc(2);
    gsl_multifit_fsolver *solver = gsl_multifit_fsolver_alloc(gsl_multifit_fsolver_lmsder, n, 2);
    gsl_multifit_fsolver_set(solver, &f, c);

    int status = gsl_multifit_fsolver_iterate(solver);
    if (status) {
        std::cerr << "Error: " << gsl_strerror(status) << std::endl;
        return 1;
    }

    status = gsl_multifit_fsolver_solve(solver);
    if (status) {
        std::cerr << "Error: " << gsl_strerror(status) << std::endl;
        return 1;
    }

    // Print the coefficients
    double a = gsl_vector_get(c, 0);
    double b = gsl_vector_get(c, 1);
    std::cout << "a: " << a << std::endl;
    std::cout << "b: " << b << std::endl;

    // Clean up
    gsl_vector_free(X);
    gsl_vector_free(yv);
    gsl_matrix_free(design);
    gsl_multifit_fsolver_free(solver);

    return 0;
}

This program uses gsl_multifit_function and gsl_multifit_function_fdf to define the logistic function and its derivatives, then uses gsl_multifit_fsolver to find the coefficients that minimize the sum of squared errors between the predicted values and the actual values. The resulting coefficients are printed to standard output. Note that this program assumes a simple logistic model with one independent variable (x). If you have more than one independent variable, you'll need to modify the design matrix accordingly.

USER

You need to call gsl_multifit_fdfsolver_iterate in a loop though???

ASSISTANT

Yes, you're right. Here's an updated program that iterates until convergence:

#include <iostream>
#include <gsl/gsl_multifit.h>

int main() {
    // Define the input data
    double x[] = {1, 2, 3, 4, 5};
    double y[] = {0.1, 0.2, 0.3, 0.4};

    // Define the parameters
    int n = sizeof(x) / sizeof(x[0]);
    gsl_vector *X = gsl_vector_alloc(n);
    gsl_vector *yv = gsl_vector_alloc(n);

    // Copy the data into GSL vectors
    for (int i = 0; i < n; ++i) {
        gsl_vector_set(X, i, x[i]);
        gsl_vector_set(yv, i, y[i]);
    }

    // Define the design matrix
    gsl_matrix *design = gsl_matrix_alloc(n, 2);
    for (int i = 0; i < n; ++i) {
        gsl_matrix_set(design, i, 0, 1);
        gsl_matrix_set(design, i, 1, x[i]);
    }

    // Define the function and its derivatives
    struct {
        double (*f)(const gsl_vector *x, void *params);
        void (*df)(const gsl_vector *x, void *params, gsl_matrix *J);
        void (*fdf)(const gsl_vector *x, void *params, double *f, gsl_matrix *J);
    } F;

    F.f = [](const gsl_vector *c, void *) -> double {
        double a = gsl_vector_get(c, 0);
        double b = gsl_vector_get(c, 1);
        double sum = 0.0;

        for (int i = 0; i < n; ++i) {
            double xi = gsl_vector_get(X, i);
            double yi = gsl_vector_get(yv, i);
            sum += log(1.0 + exp(-yi * (a + b * xi));
        }

        return sum;
    };

    F.df = [](const gsl_vector *c, void *params, gsl_matrix *J) {
        double a = gsl_vector_get(c, 0);
        double b = gsl_vector_get(c, 1);

        for (int i = 0; i < n; ++i) {
            double xi = gsl_vector_get(X, i);
            double yi = gsl_vector_get(yv, i);
            double e = exp(-yi * (a + b * xi));

            gsl_matrix_set(J, i, 0, -yi * e / (1.0 + e));
            gsl_matrix_set(J, i, 1, -yi * xi * e / (1.0 + e));
        }
    };

    F.fdf = [](const gsl_vector *c, void *params, double *f, gsl_matrix *J) {
        F.f(c, params);
        F.df(c, params, J);
    };

    // Solve for the coefficients
    gsl_multifit_function f;
    f.n = n;
    f.p = 2;
    f.f = &F.f;
    f.df = &F.df;
    f.fdf = &F.fdf;

    gsl_multifit_function_fdf FDF;
    FDF.f = &F.f;
    FDF.df = &F.df;
    FDF.fdf = &F.fdf;

    gsl_vector *c = gsl_vector_alloc(2);
    gsl_multifit_fsolver *solver = gsl_multifit_fsolver_alloc(gsl_multifit_fsolver_lmsder, n, 2);
    gsl_multifit_fsolver_set(solver, &f, c);

    int status;
    do {
        status = gsl_multifit_fsolver_iterate(solver);
        if (status) {
            std::cerr << "Error: " << gsl_strerror(status) << std::endl;
            return 1;
        }
    } while (status == GSL_CONTINUE);

    // Print the coefficients
    double a = gsl_vector_get(c, 0);
    double b = gsl_vector_get(c, 1);
    std::cout << "a: " << a << std::endl;
    std::cout << "b: " << b << std::endl;

    // Clean up
    gsl_vector_free(X);
    gsl_vector_free(yv);
    gsl_matrix_free(design);
    gsl_multifit_fsolver_free(solver);

    return 0;
}

This program uses gsl_multifit_fdfsolver to define the logistic function and its derivatives, then iterates until convergence using gsl_multifit_fsolver. The resulting coefficients are printed to standard output. Note that this program assumes a simple logistic model with one independent variable (x). If you have more than one independent variable, you'll need to modify the design matrix accordingly.

Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 548 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8528.82ms p(95)=20595.61ms fails=, finish reason: stop=490 truncated=58
  • Prompt processing (pp): avg=97.24tk/s p(95)=429.33tk/s
  • Token generation (tg): avg=34.02tk/s p(95)=46.78tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=Add-Alpaca-chat-template commit=5ef84eadd69f9dc4eb9c9e8396b05953e3694e49

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716159196 --> 1716159820
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 322.12, 322.12, 322.12, 322.12, 322.12, 863.29, 863.29, 863.29, 863.29, 863.29, 848.53, 848.53, 848.53, 848.53, 848.53, 861.07, 861.07, 861.07, 861.07, 861.07, 926.73, 926.73, 926.73, 926.73, 926.73, 934.7, 934.7, 934.7, 934.7, 934.7, 924.93, 924.93, 924.93, 924.93, 924.93, 949.52, 949.52, 949.52, 949.52, 949.52, 954.03, 954.03, 954.03, 954.03, 954.03, 948.08, 948.08, 948.08, 948.08, 948.08, 957.61, 957.61, 957.61, 957.61, 957.61, 987.5, 987.5, 987.5, 987.5, 987.5, 976.56, 976.56, 976.56, 976.56, 976.56, 993.8, 993.8, 993.8, 993.8, 993.8, 997.4, 997.4, 997.4, 997.4, 997.4, 994.57, 994.57, 994.57, 994.57, 994.57, 993.01, 993.01, 993.01, 993.01, 993.01, 987.32, 987.32, 987.32, 987.32, 987.32, 975.37, 975.37, 975.37, 975.37, 975.37, 970.72, 970.72, 970.72, 970.72, 970.72, 969.08, 969.08, 969.08, 969.08, 969.08, 965.82, 965.82, 965.82, 965.82, 965.82, 967.8, 967.8, 967.8, 967.8, 967.8, 978.05, 978.05, 978.05, 978.05, 978.05, 976.13, 976.13, 976.13, 976.13, 976.13, 976.87, 976.87, 976.87, 976.87, 976.87, 976.89, 976.89, 976.89, 976.89, 976.89, 973.16, 973.16, 973.16, 973.16, 973.16, 971.44, 971.44, 971.44, 971.44, 971.44, 972.9, 972.9, 972.9, 972.9, 972.9, 970.25, 970.25, 970.25, 970.25, 970.25, 966.62, 966.62, 966.62, 966.62, 966.62, 965.09, 965.09, 965.09, 965.09, 965.09, 970.96, 970.96, 970.96, 970.96, 970.96, 970.48, 970.48, 970.48, 970.48, 970.48, 966.25, 966.25, 966.25, 966.25, 966.25, 904.38, 904.38, 904.38, 904.38, 904.38, 900.75, 900.75, 900.75, 900.75, 900.75, 903.89, 903.89, 903.89, 903.89, 903.89, 904.09, 904.09, 904.09, 904.09, 904.09, 903.9, 903.9, 903.9, 903.9, 903.9, 910.65, 910.65, 910.65, 910.65, 910.65, 911.84, 911.84, 911.84, 911.84, 911.84, 908.7, 908.7, 908.7, 908.7, 908.7, 907.92, 907.92, 907.92, 907.92, 907.92, 904.36, 904.36, 904.36, 904.36, 904.36, 905.99, 905.99, 905.99, 905.99, 905.99, 907.84, 907.84, 907.84, 907.84, 907.84, 906.13, 906.13, 906.13, 906.13, 906.13, 909.6, 909.6, 909.6, 909.6, 909.6, 908.34, 908.34, 908.34, 908.34, 908.34, 912.49, 912.49, 912.49, 912.49, 912.49, 911.36, 911.36, 911.36, 911.36, 911.36, 914.05, 914.05, 914.05, 914.05, 914.05, 912.39, 912.39, 912.39, 912.39, 912.39, 913.59, 913.59, 913.59, 913.59, 913.59, 912.41, 912.41, 912.41, 912.41, 912.41, 912.35, 912.35, 912.35, 912.35, 912.35, 909.39, 909.39, 909.39, 909.39, 909.39, 911.85, 911.85, 911.85, 911.85, 911.85, 911.43, 911.43, 911.43]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716159196 --> 1716159820
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.71, 38.71, 38.71, 38.71, 38.71, 33.27, 33.27, 33.27, 33.27, 33.27, 26.97, 26.97, 26.97, 26.97, 26.97, 29.75, 29.75, 29.75, 29.75, 29.75, 29.69, 29.69, 29.69, 29.69, 29.69, 30.47, 30.47, 30.47, 30.47, 30.47, 32.33, 32.33, 32.33, 32.33, 32.33, 33.35, 33.35, 33.35, 33.35, 33.35, 33.7, 33.7, 33.7, 33.7, 33.7, 33.76, 33.76, 33.76, 33.76, 33.76, 33.86, 33.86, 33.86, 33.86, 33.86, 33.48, 33.48, 33.48, 33.48, 33.48, 32.65, 32.65, 32.65, 32.65, 32.65, 32.31, 32.31, 32.31, 32.31, 32.31, 31.7, 31.7, 31.7, 31.7, 31.7, 29.97, 29.97, 29.97, 29.97, 29.97, 30.02, 30.02, 30.02, 30.02, 30.02, 30.25, 30.25, 30.25, 30.25, 30.25, 30.07, 30.07, 30.07, 30.07, 30.07, 29.93, 29.93, 29.93, 29.93, 29.93, 29.86, 29.86, 29.86, 29.86, 29.86, 29.91, 29.91, 29.91, 29.91, 29.91, 30.12, 30.12, 30.12, 30.12, 30.12, 30.02, 30.02, 30.02, 30.02, 30.02, 30.15, 30.15, 30.15, 30.15, 30.15, 30.49, 30.49, 30.49, 30.49, 30.49, 30.41, 30.41, 30.41, 30.41, 30.41, 30.47, 30.47, 30.47, 30.47, 30.47, 30.64, 30.64, 30.64, 30.64, 30.64, 30.89, 30.89, 30.89, 30.89, 30.89, 30.96, 30.96, 30.96, 30.96, 30.96, 31.07, 31.07, 31.07, 31.07, 31.07, 31.21, 31.21, 31.21, 31.21, 31.21, 31.02, 31.02, 31.02, 31.02, 31.02, 30.96, 30.96, 30.96, 30.96, 30.96, 30.39, 30.39, 30.39, 30.39, 30.39, 30.21, 30.21, 30.21, 30.21, 30.21, 30.26, 30.26, 30.26, 30.26, 30.26, 30.4, 30.4, 30.4, 30.4, 30.4, 30.56, 30.56, 30.56, 30.56, 30.56, 30.59, 30.59, 30.59, 30.59, 30.59, 30.72, 30.72, 30.72, 30.72, 30.72, 30.49, 30.49, 30.49, 30.49, 30.49, 29.84, 29.84, 29.84, 29.84, 29.84, 29.78, 29.78, 29.78, 29.78, 29.78, 28.97, 28.97, 28.97, 28.97, 28.97, 28.59, 28.59, 28.59, 28.59, 28.59, 28.58, 28.58, 28.58, 28.58, 28.58, 28.61, 28.61, 28.61, 28.61, 28.61, 28.64, 28.64, 28.64, 28.64, 28.64, 28.73, 28.73, 28.73, 28.73, 28.73, 28.82, 28.82, 28.82, 28.82, 28.82, 28.79, 28.79, 28.79, 28.79, 28.79, 28.81, 28.81, 28.81, 28.81, 28.81, 28.66, 28.66, 28.66, 28.66, 28.66, 28.71, 28.71, 28.71, 28.71, 28.71, 28.85, 28.85, 28.85, 28.85, 28.85, 28.91, 28.91, 28.91, 28.91, 28.91, 28.98, 28.98, 28.98, 28.98, 28.98, 29.09, 29.09, 29.09, 29.09, 29.09, 29.11, 29.11, 29.11]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716159196 --> 1716159820
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.14, 0.14, 0.14, 0.14, 0.14, 0.36, 0.36, 0.36, 0.36, 0.36, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.26, 0.26, 0.26, 0.26, 0.26, 0.19, 0.19, 0.19, 0.19, 0.19, 0.31, 0.31, 0.31, 0.31, 0.31, 0.35, 0.35, 0.35, 0.35, 0.35, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.38, 0.38, 0.38, 0.38, 0.38, 0.33, 0.33, 0.33, 0.33, 0.33, 0.26, 0.26, 0.26, 0.26, 0.26, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.58, 0.58, 0.58, 0.58, 0.58, 0.46, 0.46, 0.46, 0.46, 0.46, 0.55, 0.55, 0.55, 0.55, 0.55, 0.44, 0.44, 0.44, 0.44, 0.44, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716159196 --> 1716159820
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0]
                    

@mofosyne mofosyne added enhancement New feature or request review complexity : low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request review complexity : low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants