Expertise and knowledge base

Here’s how to upload training data in the expertise and knowledge base training room.

Hey!

 

If you’ve not already watched our intro to prompts and completions, watch that first here.

 

You can upload .pdf, .docx, .txt, and .csv files to your AI. We have some upload FAQS here too.

 

In this Loom video, I’ll guide you through the process of uploading PDF files, giving details about the content type, and chunking it up to create prompts and completions.

 

I’ll also share some tips on the type of content that works best and how to check if the prompts and completions make sense. By the end of this video, you’ll be able to upload your own files and create prompts and completions for your AI model. So, let’s get started!

Here’s some more guidance on what kind of document you should upload to this training for the best results:

In summary, to create the best prompts and completions possible:

 

1. Start by uploading smaller documents to see what they produce.

 

2. Choose documents where the information is presented clearest.

 

Avoid:

– Unedited transcripts

– Verbose, complicated copy

– Obscure presentation of data

 

Include:

– FAQ documents

– Concise blogs or articles

– Clearly formatted course material

 

3. Use tools like ChatGPT to simplify information if needed.

 

Use a prompt such as “Turn the following information into more succinct and actionable points [paste information]”. Then use this to create a document for upload.

Go back to the training overview page for other training resources.

Upload FAQs

Absolutely not. Quality is far more important than quantity and a huge mistake is to upload everything you’ve ever created.

 

Focus on recency and relevance.

 

Your clients and members of your audience are likely to ask similar questions. Having excellent answers to these questions and solutions for common challenges is most important.

 

Your content is going to compete to get used, so focus on your most recent work, especially if your views or methodology have evolved over time. For example, if you’ve written two articles on a similar topic a few years apart, only upload the most recent one.

There’s a character limit on files to make managing your training data easier.

 

If you uploaded a 100,000 word manuscript to Coachvox AI, it could produce over 1000 prompts and completions. This makes reviewing and editing this data very challenging.

 

Breaking longer texts into smaller chunks or chapters means your data is more organised and easier to modify if needed.

Web pages are complicated. They contain headers, footers, side bars, images, subtitles, links and meta data. In our experience, data created from web links is suboptimal and it’s far better to find the specific text from those pages you want your AI know and upload it as a text-based file.

Video content tends not to produce good AI training data. Remember than your AI is based on a language model, so it’s only the language used that is relevant.

Unscripted videos when context from imagery, body language and interaction is lost are confusing to make sense of.

 

It is a much better option to upload the script for a video or podcast or produce a transcript and ensure it reads as you’d like your AI to understand it.

These restrictions make the training of your AI more effective. It means you need to put a little more effort in at the beginning to fine your data, but the result will be superior.

Great question! Check this out: