Suggestion: publishing (parts of the) training data

by FlipTip - opened 11 days ago

11 days ago

Hi IBM Team,
Thanks for the permissive license.

Are there any plans to release the training data, or parts of it, to help the community gain deeper insights into the model? Alternatively, sharing the synthetic data generation pipeline would also go a long way towards better understanding.

Community-driven open-source AI thrives on transparency; it accelerates collaborative research.

Thanks again for your contributions to the AI research community.

gabegoodhart

IBM Granite org 11 days ago

Hi @FlipTip , thanks for bringing up the topic of data transparency. A detailed description of the training data will be released in the final whitepaper when the full set of 4.0 models is launched, so stay tuned!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment