Kathan Joshi Portfolio

Hi,
I'm Kathan Joshi

I'm a

About Me

Hello, I’m Kathan Joshi—an AI/ML engineer and product-minded data scientist with an M.S. in Computer Science from Penn State. I currently lead AI at WinWin Labs, where I design agentic workflows, integrate RAG pipelines, build reusable agentic AI APIs and hooks, and craft prompts for an AI recruiting platform. Previously, I’ve built ML systems across healthcare, manufacturing, and space research—from predicting hospital length of stay on 60M+ patient records to computer-vision quality control in factories and reinforcement-learning navigation at ISRO. Outside of work, you’ll usually find me on a badminton or volleyball court, out for a run, or experimenting with new model architectures and evals.

Experience

Aug 2025 – Present

AI Product Manager & Lead AI Engineer - WinWin Labs

Defined product vision, roadmap, user stories, KPIs, and OKRs for an agentic AI recruiting platform; used RICE scoring and Agile/Scrum practices to prioritize the backlog, achieving ~95% on-time delivery and boosting recruiter throughput by ~12%.
Designed multi-agent AI architectures leveraging LLMs (GPT-4, Qwen) with retrieval-augmented generation, tool/function calling, and fallback policies; implemented agent graphs in LangChain/LangGraph and Neo4j, improving recall@10 by ~24% and answer accuracy by ~18%.
Built scalable agentic RAG pipelines and chatbots using LangChain/LangGraph with vector databases (Pinecone, Weaviate) and PostgreSQL (ORM tool calling); optimized chunking, embeddings, caching, and parallel processing to cut response latency by ~28–30% while stabilizing answer quality.
Engineered robust ML and data pipelines with Python, SQL, Airflow, and Postgres to ingest, clean, and unify structured and unstructured data, reducing ETL cycle time by ~40% and improving data quality for analytics and model training.
Developed evaluation harnesses and A/B testing frameworks measuring recall@k, MRR, and faithfulness; designed guardrails for PII redaction, routing, and error handling, reducing hallucinations by ~15% and improving response quality by ~12 percentage points.
Exposed AI capabilities via FastAPI-based microservices on AWS/GCP with Pydantic-validated schemas, and built a React front end for visual testing and operations, enabling non-technical stakeholders to inspect and validate agent behavior.
Partnered closely with Project Managers, GTM leaders, UX designers, market researchers, marketers, and software engineers to translate user needs into PRDs, conduct user and market research, and communicate insights via dashboards and reports, accelerating generative-AI adoption across the product.

Aug 2023 – May 2025

Machine Learning Research Assistant/TA - Penn State

Executed end-to-end exploratory data analysis on 60M+ HCUP healthcare records using Python, SQL, and Dask; designed a five-step pipeline (ingestion, cleaning, feature engineering, modeling, evaluation) with complete data lineage and metadata documentation, improving model stability by ~38% and reducing data inconsistencies by ~45%.
Built Tableau story dashboards that surfaced enrollment- and outcome-relevant insights (e.g., demographics, LOS trends); boosted predictive accuracy by ~25% while providing clear visual storytelling for clinicians and other non-technical stakeholders.
Benchmarked and fine-tuned a suite of models (AdaBoost, CatBoost, XGBoost, Random Forest, Kolmogorov–Arnold Networks, Multiple Linear Regression, Neural Networks) to predict patient length of stay, improving R² from 0.43 to 0.61 and reducing RMSE from 2.11 to 1.34.
Generated SHAP-based interpretability reports (global feature importance, summary/dependence plots, patient-level force plots) to validate fairness, flag spurious correlations/leakage, and support clinician sign-off and model governance.
Validated datasets and reports via automated checks and user-acceptance testing with stakeholders, ensuring data quality and reliable analytics for downstream research and decision-making.
Supported coursework and mentored 200+ students as a teaching assistant, reinforcing core machine-learning concepts and fostering collaboration in labs and office hours.

Aug 2022 – Jul 2023

Data Scientist - Safal Industries

Conducted comprehensive exploratory analysis on equipment, production and sales data using Python, Pandas, SQL and AWS SageMaker; uncovered trends that boosted productivity by ~11 % and reduced costs by ~7 %.
Developed and deployed an LSTM/ARIMA time‑series forecasting model for five SKUs; reduced Weighted Absolute Percentage Error (WAPE) from 11 % to 7 % and optimised production schedules to cut inventory‑holding costs by 3 %.
Implemented a real‑time Spark Streaming ETL pipeline with an XGBoost anomaly‑detection model in Docker/Kubernetes; accelerated quality‑control cycles by 20 % and improved predictive‑maintenance lead time by 15 %.
Built and deployed a computer‑vision defect‑detection system using CNNs and TensorFlow; achieved ~92 % accuracy and reduced inspection labour by 30 %.
Created sentiment‑analytics dashboards by fine‑tuning BERT on 6 k customer reviews; identified defect themes and informed product‑design changes that reduced customer churn by 8 %.
Implemented data-validation routines and collaborated with cross-functional stakeholders to define new data fields and reporting logic; delivered a Power BI dashboard of key production and sales KPIs that guided strategic planning and leadership decisions.

May 2022 – Aug 2022

Data Scientist – Simulation & Visualization Intern - ISRO

Developed a reinforcement‑learning Deep Q‑Network (DQN) for autonomous rover navigation; tuned reward functions and ε‑greedy decay and trained on 50 k simulations, achieving 90 % path completion on unseen test tracks.
Aggregated simulation data into MySQL and designed a Power BI dashboard tracking 10+ performance metrics, increasing anomaly diagnosis speed by 20 %.
Automated data‑collection scripts in Python and authored a user guide; conducted onboarding workshops that boosted simulator adoption to 85 % within a month.
De‑risked ~$250 k in hardware investment by delivering an autonomous‑rover simulation MVP that secured departmental grants for continued R&D.

Education

2023 – 2025

MS Computer Science - Pennsylvania State University

GPA: 3.63/4.00

Relevant Courses: Machine Learning, NLP, Advanced Statistics, Distributed Systems, Advanced Database Systems, Artificial intelligence, Deep Learning

2019 – 2023

BTech Computer Science - Ahmedabad University

GPA: 3.41/4.00

Relevant Courses: Machine Learning, Deep Learning, Database Management, Big Data Analytics, Advanced Mathematics, Statistical Methods, Natural Language Processing, Predictive Modelling, Business Visualization, Computer Vision, Artificial intelligence

Skills

Programming Languages

Python R SQL Java C/C++ JavaScript HTML/CSS Salesforce Apex

AI/ML & Deep Learning

PyTorch TensorFlow scikit-learn XGBoost LightGBM CatBoost KAN NLTK SpaCy Transformers

GenAI & LLM Frameworks

LangChain LangGraph CrewAI LlamaIndex RAG Pipeline Generative AI N8n

Data Science & Analytics

NumPy Pandas Dask SciPy Statsmodels Prophet Darts

Databases & Vector Stores

MySQL PostgreSQL Oracle BigQuery Snowflake Redis Pinecone Weaviate Qdrant FAISS

Big Data & Cloud

Apache Spark Apache Hadoop Apache Airflow AWS SageMaker AWS Bedrock AWS ECS AWS EC2 AWS Lambda AWS EKS Vertex AI Azure GCP

BI & Visualization

Power BI Tableau Excel DAX Gephi

DevOps & Tools

Docker Kubernetes MLflow Git GitHub Actions Linux Flask FastAPI React Streamlit

Product Management

Agile/Scrum OKRs & KPIs RICE Scoring A/B Testing Jira Figma Miro Amplitude PRD Writing User Stories

Salesforce & CRM

Salesforce CRM Sales Cloud Salesforce Service Cloud Salesforce Data Cloud Salesforce Flows Salesforce LWC Salesforce Agentforce Salesforce Agent Builder Salesforce Prompt Builder

Specialized Skills

Recommendation Systems Reinforcement Learning Computer Vision NLP Predictive AI Gurobi

Featured Projects

Resume Sense

Multi‑AI agentic framework for smart resume screening.

Churn Predicting Analysis

Predicting customer churn for a telecommunications company using supervised machine‑learning techniques.

Fake News Detection System

LSTM model with 99.69 % accuracy using NLTK‑optimized text processing

Multiclass Sentiment Analyzer

Fine‑tuned BERT model achieving 23 % higher classification performance on Twitter data

Chest Doctor

RESNET‑50 model with 75 % general accuracy and with 100 % class‑specific accuracy in predicting COVID‑19 disease.

KATNET

A dynamic protein‑ligand binding affinity prediction model using cross‑attention‑based 3D‑CNN

ML‑Enhanced Forex Signal Engine

Machine‑learning‑powered engine for optimising Forex trading signals using time‑series models and deep learning.

Certifications

Salesforce Certified Agentforce Specialist

Salesforce | 2025

Generative AI with Large Language Models

Amazon Web Services | 2025

AWS Certified AI Practitioner

Amazon Web Services | 2024

Machine Learning Specialization

DeepLearning.AI | 2024

Learning Snowflake DB

LinkedIN | 2025

Publications

Data Augmentation in Convolutional Neural Networks for Channel Operating Margin Classification

IEEE International Conference on Consumer Electronics | 2026

KS Joshi, Prithvi Choudhary, Sedig S. Agili, Aldo W, Morales

Under Review

Leveraging Machine Learning to Predict Length of Stay for Hip Fractures in the Elderly: Insights from the Nationwide Emergency Department Sample (NEDS)

BMC Health Services | 2025

KS Joshi, Sara Imanpour, Md Faisal Kabir