This researcher has a new way to measure AI performance. It's BS, literally.

"BullshitBench presents models with prompts that sound technical but collapse under scrutiny, testing their ability to push back against nonsense or proceed confidently."

"One example question asks about the viscosity of a deal pipeline and the revenue throughput transition, illustrating the absurdity of the prompts used in the benchmark."

"Another question humorously inquires about attributing quarterly EBITDA variance to the font weight of invoice templates versus the color palette of financial dashboards."

"The project has quickly gained traction, with over 1,200 stars on GitHub, reflecting interest in how AI handles nonsensical yet technical-sounding inquiries."

BullshitBench is a new AI benchmark created to evaluate whether large language models can recognize nonsensical questions. Developed by Peter Gostev, the project features absurd prompts that sound technical but lack substance. Since its launch, BullshitBench has gained popularity, amassing over 1,200 stars on GitHub. The questions are designed to challenge AI models, showcasing their ability to discern credible information from nonsense. Examples include humorous inquiries related to finance, law, and medical scenarios, highlighting the absurdity of the prompts.

#ai #benchmark #language-models #nonsense #evaluation

Read at www.businessinsider.com

Unable to calculate read time

Collection

[

...

]

This researcher has a new way to measure AI performance. It's BS, literally.This researcher has a new way to measure AI performance. It's BS, literally. Briefly

This researcher has a new way to measure AI performance. It's BS, literally.
This researcher has a new way to measure AI performance. It's BS, literally.
Briefly