Welcome to Our AI Consulting Services

- team naclai

We provide expert AI consulting to help you leverage the power of artificial intelligence in your business.

Contact Us Careers
Welcome to Our AI Consulting Services

AI Dev Ops

Steps to Leverage AI in Your Org

Organize Your Data

Organize, organize, organize data, and make it easy to retrieve. If the same data is available in several different structures, that can make it easier to search. Request all engineers to submit their Excel files to a database engineer. The database engineer will convert the Excel sheets to SQL tables. Excel sheets are isolated from each other. SQL tables can be joined with a query. A simple example below with 3 isolated tables are joined. More complex joins and schemas are possible. Schemas refer to the column names in a table. The advantage of SQL over Excel is that millions of rows of data can be queried. Excel does not support scalable data.

Chemical Data Tables

Example of R&D Science Lab Data *Not real


Boiling Points Table

Chemical Name Boiling Point (°C)
H2O2 151.2
Sulfuric Acid 337.0

Prices Table

Chemical Name Price (USD per kg)
H2O2 0.50
Sulfuric Acid 0.20

Shelf Life Table

Chemical Name Shelf Life (years)
H2O2 2
Sulfuric Acid 5
Chemical Data Result

Chemical Data Table

Chemical Name Boiling Point (°C) Price (USD per kg) Shelf Life (years)
H2O2 151.2 0.50 2
Sulfuric Acid 337.0 0.20 5


Secure data. To keep data secure on premises, replace all data entries and column names with arbitrary values and words before sharing with an external database engineer. This can be done with regex functionality. If you still feel that this does not secure your metadata, add extra columns. Outline your security requirements and develop a process for working with data engineers. In manufacturing practices, high volume data comes from sensors that are recording real time data at short time intervals, example every 30 seconds. This should definitely be stored in a database like SQL or noSQL. If the source of data is distributed, consider an architecture like Apache Spark, which can keep data distributed. Statisticians conduct statistical tests on some subset of data. Use their tests and human insights to develop a pipeline for collecting statistics on data. Set up a cron job. This refers to a program that runs at a time increment, daily and weekly. A chemical engineer can then review the summary on statistics and data collected over time. Use machine learning (ML) libraries like scikit-learn and numpy to process the data.

Database engineering → Security → Machine Learning

This structure will create a sustainable process to store, retrieve, and analyze data for R&D and Manufacturing.

Digging Deeper!

Data accumulated from different sources is usually labeled differently, has missing entries, and different shapes and sizes. For ML models, data has to be cleaned and formatted. All this referred to as data science. To make the best use of your time, develop a pipeline where data is fed in and gets cleaned. In the beginning, there will new methods that have to be added to the cleaning routine. After a while, all the different possibilities will get incorporated into your process.

Common steps

1) Re-labeling columns that mean the same thing but were labeled differently by different people. Example: Temperature, temp, Temp. These tiny differences will prevent SQL or Excel from joining columns. Implement this by using a phrase or word embedding model to embed all column names to a vector, apply a clustering algorithm on the vectors, map the vectors in each cluster back to their column names. This will result in column names of the same meaning being in the same cluster. Then all the phrases in one cluster can be replaced with one column name.

2) Deduplication of data. This is important when training a model, so to not-over train on some parts of the data.

3) Removing incorrect data. This is tough and has to be strategized case-by-case.



See Public Posts on Software
naclai is served via caddy