3 3 46

Sarthak Malhotra

zarmalhotra

AI & ML interests

None yet

Recent Activity

new activity 9 days ago

UCSC-VLAA/MedReason:3rd prize in reasoning dataset competition. Congratulations

new activity 15 days ago

rekrek/subset_arcee-10k:Congratulations on winning Curator Spotlight in Reasoning Dataset Competition

updated a Space 15 days ago

reasoning-datasets-competition/README

View all activity

Organizations

zarmalhotra's activity

New activity in UCSC-VLAA/MedReason 9 days ago

3rd prize in reasoning dataset competition. Congratulations

#3 opened 9 days ago by

zarmalhotra

New activity in rekrek/subset_arcee-10k 15 days ago

Congratulations on winning Curator Spotlight in Reasoning Dataset Competition

#2 opened 15 days ago by

zarmalhotra

updated a Space 15 days ago

README

🌍

liked 7 datasets 22 days ago

liked a dataset 23 days ago

shb777/RelatLogic-Reasoning

Viewer • Updated May 2 • 115 • 69 • 2

liked a dataset 24 days ago

VedantPadwal/quantitative-finance-reasoning

Viewer • Updated May 1 • 128 • 204 • 5

liked a dataset 25 days ago

rekrek/reasoning-engaging-story

Viewer • Updated 13 days ago • 220 • 9.96k • 3

liked 4 datasets 27 days ago

codelion/math500-cot-experiment

Viewer • Updated Apr 30 • 1.5k • 161 • 5

DataTonic/dark_thoughts_case_study_reason

Viewer • Updated 27 days ago • 39.7k • 198 • 8

DataTonic/dark_thoughts_case_study_merged

Viewer • Updated 27 days ago • 40.9k • 147 • 5

strickvl/counterfactual_history_reasoning

Viewer • Updated 28 days ago • 100 • 296 • 6

liked a dataset 30 days ago

patrickfleith/instruction-freak-reasoning

Viewer • Updated 14 days ago • 179 • 188 • 4

reacted to ZennyKenny's post with 🔥 about 1 month ago

Post

3362

When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. 🔥🔥🔥

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.

In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset

Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a ❤️ if you think it could be useful or hit the Community section with suggestions / critiques.

2 replies

New activity in reasoning-datasets-competition/README about 1 month ago

Competition Lobby

❤️ 3

#1 opened about 2 months ago by

ZennyKenny