evaluation-metrics

Open source Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Jun 1, 2024
Python

AllenNeuralDynamics / segmentation-skeleton-metrics

Star

Evaluates neuron segmentations in terms of statistics related to the number of splits and merges

python evaluation-metrics neuron-segmentation

Updated Jun 1, 2024
Python

kolenaIO / kolena

Star

Python client for Kolena's machine learning testing platform

testing machine-learning evaluation evaluation-metrics evaluation-framework mlops evaluate-models llmops

Updated May 31, 2024
Python

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated May 31, 2024
Python

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

Star

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

benchmark awesome evaluation image-generation evaluation-metrics generative-models video-generation evaluation-system

Updated May 31, 2024

Striveworks / valor

Star

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.

computer-vision evaluation classification object-detection image-segmentation evaluation-metrics model-evaluation mlops

Updated May 31, 2024
Python

k4black / codebleu

Star

Pip compatible CodeBLEU metric implementation available for linux/macos/win

code evaluation code-generation code-evaluation evaluation-metrics codebleu

Updated May 30, 2024
Python

SevilayMuni / Project-No.6-Tree-Based-Models

Star

Random Forest Assisted Suggestions for Salifort Motors Employee Retention: Plan, Analyze, Construct and Execute

python data-science machine-learning scikit-learn logistic-regression matplotlib decision-trees evaluation-metrics random-forest-classifier gridsearchcv seaborn-plots

Updated May 30, 2024
Jupyter Notebook

ivanovsdesign / index_quality

Star

FAISS and Annoy indexing + search evaluation workflow

search indexing evaluation-metrics annoy faiss

Updated May 29, 2024
Jupyter Notebook

Raminghorbanii / PATE

Star

Official repository for “PATE: Proximity-Aware Time series anomaly Evaluation”.

time-series evaluation outlier-detection evaluation-metrics anomaly-detection

Updated May 29, 2024
Python

google-research / rliable

Star

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

benchmarking machine-learning google reinforcement-learning rl evaluation-metrics

Updated May 28, 2024
Jupyter Notebook

SevilayMuni / Project-No.5-XGBoost-RandomForest-ClaimClassification

Star

XGBoost Predictive Model for TikTok's Claim Classification: EDA, Hypothesis Testing, Logistic Regression, Tree-Based Models

python data-science machine-learning random-forest seaborn xgboost logistic-regression feature-engineering hypothesis-testing hyperparameter-tuning evaluation-metrics ttest sklearn-library scipy-stats

Updated May 28, 2024
Jupyter Notebook

relari-ai / continuous-eval

Star

Open-Source Evaluation for GenAI Application Pipelines

information-retrieval evaluation-metrics evaluation-framework rag llmops retrieval-augmented-generation llm-evaluation

Updated Jun 2, 2024
Python

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated May 27, 2024
Python

up42 / image-similarity-measures

Star

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

processing machine-learning image metrics evaluation-metrics p1

Updated May 27, 2024
Python

MiXaiLL76 / faster_coco_eval

Star

Continuation of an abandoned project fast-coco-eval

python coco evaluation-metrics pycocotools

Updated May 27, 2024
Python

Improve this page

Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-metrics

Here are 386 public repositories matching this topic...

athina-ai / athina-evals

confident-ai / deepeval

MIND-Lab / OCTIS

nick7nlp / Counting-Stars

AgentOps-AI / agentops

AllenNeuralDynamics / segmentation-skeleton-metrics

kolenaIO / kolena

huggingface / lighteval

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

Striveworks / valor

k4black / codebleu

SevilayMuni / Project-No.6-Tree-Based-Models

ivanovsdesign / index_quality

Raminghorbanii / PATE

google-research / rliable

SevilayMuni / Project-No.5-XGBoost-RandomForest-ClaimClassification

relari-ai / continuous-eval

TonicAI / tonic_validate

up42 / image-similarity-measures

MiXaiLL76 / faster_coco_eval

Improve this page

Add this topic to your repo