{"payload":{"pageCount":1,"repositories":[{"type":"Public","name":"sanpo_dataset","owner":"google-research-datasets","isFork":false,"description":"","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":1,"issueCount":3,"starsCount":38,"forksCount":1,"license":"Apache License 2.0","participation":[0,0,0,0,0,0,0,0,0,3,2,1,1,0,1,4,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2024-05-16T21:44:19.252Z"}},{"type":"Public","name":"cpcd","owner":"google-research-datasets","isFork":false,"description":"The Conversational Playlist Creation Dataset (CPCD) contains 917 conversations between two people where users express preferences over sets of songs in natural language and wizards to elicit preferences from users. The dataset includes per-song ratings and can be used to design and evaluate conversational recommendation systems.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":1,"issueCount":1,"starsCount":9,"forksCount":3,"license":null,"participation":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2024-05-03T20:40:05.465Z"}},{"type":"Public","name":"QuoteSum","owner":"google-research-datasets","isFork":false,"description":"QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on Wikipedia passages.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":8,"forksCount":1,"license":"Creative Commons Attribution Share Alike 4.0 International","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2024-03-25T03:47:22.252Z"}},{"type":"Public","name":"Synthetic-Persona-Chat","owner":"google-research-datasets","isFork":false,"description":"The Synthetic-Persona-Chat dataset is a synthetically generated persona-based dialogue dataset. It extends the original Persona-Chat dataset. ","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":41,"forksCount":3,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2024-01-02T19:27:27.793Z"}},{"type":"Public","name":"india-soil-health-card","owner":"google-research-datasets","isFork":false,"description":"","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":1,"issueCount":0,"starsCount":0,"forksCount":1,"license":"Apache License 2.0","participation":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,17,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2023-12-27T19:18:23.934Z"}},{"type":"Public","name":"LaGOT","owner":"google-research-datasets","isFork":false,"description":"We enrich the LaSOT validation set with annotations of additional object tracks, up to 10 object tracks per video in total. Tracks consist of precise bounding box annotations of moving objects. Annotations are provided at 10 fps. The original LaSOT validation set annotations and video can be downloaded from: https://vision.cs.stonybrook.edu/~lasot/","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":6,"forksCount":1,"license":"Creative Commons Attribution 4.0 International","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2023-12-23T11:02:24.785Z"}},{"type":"Public","name":"dstc8-schema-guided-dialogue","owner":"google-research-datasets","isFork":false,"description":"The Schema-Guided Dialogue Dataset","allTopics":["dialogue","assistant","nlp-machine-learning","dialogue-systems","dataset"],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":3,"starsCount":526,"forksCount":122,"license":"Creative Commons Attribution Share Alike 4.0 International","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2023-08-07T23:10:50.379Z"}},{"type":"Public","name":"Attributed-QA","owner":"google-research-datasets","isFork":false,"description":"We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in information-seeking scenarios. This release consists of human-rated system outputs for a new question-answering task, Attributed Question Answering (AQA).","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":2,"starsCount":40,"forksCount":9,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2023-07-28T22:10:57.636Z"}},{"type":"Public","name":"RxR","owner":"google-research-datasets","isFork":false,"description":"Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual per…","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":6,"starsCount":105,"forksCount":12,"license":"Creative Commons Attribution 4.0 International","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2023-07-26T19:13:04.254Z"}},{"type":"Public archive","name":"imagenet_pi","owner":"google-research-datasets","isFork":false,"description":"ImageNet-PI is a relabelled version of the standard ILSVRC2012 ImageNet dataset in which the labels are provided by a collection of 16 deep neural networks with different architectures pre-trained on the standard ILSVRC2012. Specifically, the pre-trained models are downloaded from tf.keras.applications.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":2,"forksCount":1,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2023-02-20T13:55:18.989Z"}},{"type":"Public archive","name":"MAVE","owner":"google-research-datasets","isFork":false,"description":"The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":6,"starsCount":134,"forksCount":21,"license":"Other","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2022-12-16T00:44:02.752Z"}},{"type":"Public","name":"clse","owner":"google-research-datasets","isFork":false,"description":"The Corpus of Linguistically Significant Entities (CLSE) is a dataset of named entities annotated by linguist experts. It includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. The aim of the corpus is to facilitate the creation of more linguistically diverse NLG datasets.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":7,"forksCount":1,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2022-11-28T09:23:15.609Z"}},{"type":"Public","name":"turkish-treebanks","owner":"google-research-datasets","isFork":false,"description":"A human-annotated morphosyntactic treebank for Turkish.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":1,"starsCount":31,"forksCount":2,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2022-11-18T19:08:27.605Z"}},{"type":"Public archive","name":"WikipediaAbbreviationData","owner":"google-research-datasets","isFork":false,"description":"This data set consists of 24,000 English sentences, extracted from Wikipedia in 2017, annotated to support development of an abbreviation expansion system for text-to-speech synthesis (e.g., a systm tht cn prnounc txt lk ths).","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":9,"forksCount":1,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2022-10-04T14:01:36.029Z"}},{"type":"Public","name":"clang8","owner":"google-research-datasets","isFork":false,"description":"cLang-8 is a dataset for grammatical error correction.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":10,"starsCount":99,"forksCount":5,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2022-07-19T18:18:26.448Z"}},{"type":"Public","name":"paws","owner":"google-research-datasets","isFork":false,"description":"This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":1,"issueCount":10,"starsCount":540,"forksCount":52,"license":"Other","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2022-01-04T18:59:03.137Z"}},{"type":"Public","name":"C4_200M-synthetic-dataset-for-grammatical-error-correction","owner":"google-research-datasets","isFork":false,"description":"This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":152,"forksCount":25,"license":"Creative Commons Attribution 4.0 International","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2021-12-07T12:53:31.751Z"}},{"type":"Public","name":"QED","owner":"google-research-datasets","isFork":false,"description":"QED: A Framework and Dataset for Explanations in Question Answering","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":3,"starsCount":114,"forksCount":17,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2021-08-03T15:58:12.543Z"}},{"type":"Public","name":"natural-questions","owner":"google-research-datasets","isFork":false,"description":"Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":3,"issueCount":13,"starsCount":895,"forksCount":152,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2021-07-30T00:22:08.609Z"}},{"type":"Public","name":"Nutrition5k","owner":"google-research-datasets","isFork":false,"description":"Detailed visual + nutritional data for over 5,000 plates of food.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":8,"starsCount":144,"forksCount":24,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2021-06-23T16:44:54.777Z"}},{"type":"Public","name":"gap-coreference","owner":"google-research-datasets","isFork":false,"description":" GAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of (ambiguous pronoun, antecedent name), sampled from Wikipedia for the evaluation of coreference resolution in practical applications.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":1,"issueCount":5,"starsCount":223,"forksCount":81,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2020-11-10T12:00:12.707Z"}},{"type":"Public","name":"Crisscrossed-Captions","owner":"google-research-datasets","isFork":false,"description":"Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":1,"starsCount":48,"forksCount":3,"license":"Other","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2020-09-03T19:41:44.523Z"}},{"type":"Public","name":"bam","owner":"google-research-datasets","isFork":false,"description":"","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":4,"starsCount":48,"forksCount":8,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2020-08-29T23:45:47.923Z"}},{"type":"Public","name":"tydiqa","owner":"google-research-datasets","isFork":false,"description":"TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":1,"issueCount":2,"starsCount":288,"forksCount":41,"license":"Apache License 2.0","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2020-05-28T09:54:36.909Z"}},{"type":"Public","name":"coarse-discourse","owner":"google-research-datasets","isFork":false,"description":"A large corpus of discourse annotations and relations on ~10K forum threads.","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":0,"starsCount":238,"forksCount":33,"license":null,"participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2018-12-17T01:56:53.014Z"}},{"type":"Public archive","name":"wiki-reading","owner":"google-research-datasets","isFork":false,"description":"This repository contains the three WikiReading datasets as used and described in WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia, Hewlett, et al, ACL 2016 (the English WikiReading dataset) and Byte-level Machine Reading across Morphologically Varied Languages, Kenter et al, AAAI-18 (the Turkish and Russian datasets).","allTopics":[],"primaryLanguage":{"name":"Python","color":"#3572A5"},"pullRequestCount":0,"issueCount":1,"starsCount":270,"forksCount":32,"license":"Other","participation":null,"lastUpdated":{"hasBeenPushedTo":true,"timestamp":"2018-05-17T00:39:59.617Z"}}],"repositoryCount":26,"userInfo":null,"searchable":true,"definitions":[],"typeFilters":[{"id":"all","text":"All"},{"id":"public","text":"Public"},{"id":"source","text":"Sources"},{"id":"fork","text":"Forks"},{"id":"archived","text":"Archived"},{"id":"template","text":"Templates"}],"compactMode":false},"title":"Repositories"}