Organizing My Own Data for AI
In our last article we talked about identifying data for use in your own personal AI project. This time we go over organizing your data to get the most out of it as a source for your own AI.
You could just throw all your data into the AI and hope for the best. That will cause problems down the line and might even turn into an immediate mess. It’s best to take a bit of time to review the data and find a way to organize that suits your structure.
Accessing
While you were identifying your data, you might have come across some issues where access was difficult or cumbersome. This is often true on cloud services and can sometimes be fixed with a little digging into their documentation for ways to export the data or even use APIs to retrieve a copy. A technique called “screen scraping” could also be used. Avoid manually re-entering the data unless it’s the highest priority. Being able to reliably and consistently get that data should be the goal.
Categorizing
Think of how this data can be grouped. Don’t get hung up on finding the best category. Assign as many categories as you feel you might benefit from.
Typical categories might answer:
• What it is - Customer list, Inventory, How-to docs, etc.
• Who it’s for - Sales, Production, Customer Service, etc.
• Why it’s important - Regulations, Safety, Procedures, etc.
• When it’s relevant - Dates are often part of the data, but “busy season” or “understaffed” might provide better insight later
• Where it happened – Store location, Over-seas Supplier
Relating
Finding relationships between different types of data can help the AI provide valuable insights later. Many of these relationships might seem simple and intuitive to those that are working through them every day, but it helps to spell it out to AI. It can be a process, so don’t try to get everything in the first round, you can even ask your AI to find some relationships once it’s up.
Securing
Some of your data may contain sensitive or secret information. There are a few ways to handle this:
• Encrypting – If you are moving sensitive data around, it should be encrypted. Systems that you use should also store some data (like credit card information) in an encrypted format.
• Anonymizing – You certainly don’t want to have Personally Identifiable Information exposed. Anonymizing the data will keep this information out while retaining the ability to find patterns and trends. We will address this more next time.
• Omission – You might have identified some document or data that just shouldn’t be available. Maybe you keep passwords in a spreadsheet (please don’t do that! Switch to a good password manager) Leave that out or consider a different way to store it.
Getting all your data organized can be involved, but gathering all this information about your data creates a master index for everything.
What should you do with this master index? Gold star if you said to add it to the AI. There are probably still some issues that need to be addressed, and you want to make sure all the data is accurate and clean. That’s coming next.
Tech Tip: Keeping your Data Master Index updated will help when you want to change a policy or switch to a different software system.
– Jim Hundley, ASCENDalyst.com, 727-346-6020