<
ML-powered Business Assistant Chatbot

At its core, the system uses NLP techniques with a fine-tuned DistilBERT classifier trained on a custom dataset to recognize six business-related intents, such as generating LinkedIn notes, researching B2B accounts, and extracting company value propositions.
spaCy was integrated for preprocessing and keyword extraction, while pre-trained transformer models were used for sentiment analysis and question answering.
A T5-small model was fine-tuned to generate personalized LinkedIn connection notes. Real-time company research was enabled via APIs and web scraping (LinkedIn, News APIs, Google Search, BeautifulSoup, Selenium).
For a deeper explanation, see the project documentation.
Python
Streamlit
spaCy
Transformers
BeautifulSoup
Selenium
Natural Language Processing
Web Scraping
API Integration
ML chatbot for business operations
A streamlit ML application.
For a comprehensive and in-depth explanation of the tool, please refer to the ML_Tool_Explanation.pdf document.
Preparing the environment
Create venv
I suggest to create first a virtual environment to avoid version conflicts
python -m venv venv
Then activate it
venv/Scripts/activate # for Windows
source venv/bin/activate # for Linux
Setup of the project
To run this project you have to:
-
Install the right version of PyTorch for your machine https://pytorch.org/get-started/locally/
-
Install the model for spacy
python -m spacy download en_core_web_lgIf you want faster execution but less accuracy you can install en_core_web_sm or en_core_web_md.
If you change that you also have to change it in the load function of spacy in utility/nlp.py.
-
Setup Chrome driver (ensure chromedriver is installed and accessible)
-
Then you can install the requirements with
pip install -r requirements.txt
Prepare the models
You can train the models running the jupyter notebook located at train_models.ipynb
It can take a lot of time to train the models. If you want you can download them from this link.
Extract the folder and put it in the root of the project.
- AI-agent-for-Business/
- models/
- chatbot_model/
- fine_tuned_model/
- note_model/
- fine_tuned_model/
- chatbot_model/
- custom_dataset/
- tasks/
- utility/
- ...
- models/
Environment variables
For this project I'm using several API with a free plan:
-
RapidAPI Real-Time Linkedin Scraper API (https://rapidapi.com/rockapis-rockapis-default/api/linkedin-api8) (https://rapidapi.com/rockapis-rockapis-default/api/linkedin-data-api)
It's possible to use just one of them but, due to API Free Plan limitations, one is used for profile data and the other one for company searches.
-
NewsAPI (https://newsapi.org/docs)
-
Google Cloud (https://console.cloud.google.com/) with Google Search API (https://programmablesearchengine.google.com/)
You need to create a .env file with your api keys in the main directory with this structure
NEWSAPI_API_KEY=API_KEY
LINKEDIN_RAPIDAPI_API_KEY=API_KEY
LINKEDIN_RAPIDAPI_FOR_COMPANIES_API_KEY=API_KEY
GOOGLE_CLOUD_API_KEY=API_KEY
GOOGLE_SEARCH_CX=CX
Starting the Tool
Once you have all set up, there are two options to run the Tool: command line, streamlit app.
-
To run the chatbot in the command line run
python chatbot.py -
To run the chatbot in a streamlit app run
streamlit run streamlit.pyThis will automatically open a browser tab with the application or you can open it manually going to http://localhost:8501/
It's preferred the Streamlit App because it has an intuitive UI and in the command line version some logs could be printed also.
Tool schema
General Chatbot functioning schema
Chatbot Internal Operating
ML-Powered Suggestions
How it works:
I use a MiniLM-L6 transformer model to turn project descriptions into mathematical vectors.
By calculating the Cosine Similarity between these vectors, the model identifies semantically related work.
Analyzing project similarities...