Hello, World.

I'm Ashvi Soni.

Data Science Data Engineering Data Analysis Machine Learning NLP
Data Engineer, Eclinical Solutions

More About Me

Let me introduce myself.

Profile Picture

I am Ashvi Soni, a Computer Science graduate from the George Washington University, graduated in May 2021. A little bit about myself-As a dedicated and detail-oriented professional, I specialize in data engineering and machine learning, proficient in Python, SQL, and R. With focus on ETL processes, data visualization, and predictive modeling, I am eager to learn and grow in a collaborative environment. My meticulous approach ensures optimized data accuracy, supporting informed decision-making.


My primary interests lie in Data Science, Machine Learning, Data Analysis, Data Engineering, NLP, AI, and Computer Vision. Please feel free to reach out if you believe I would be a good fit for your needs. I am enthusiastic about contributing my skills and knowledge to meaningful projects in these domains.

  • Fullname: Ashvi Soni
  • Interests: Data Science, Machine Learning, NLP, Computer Vision, Artificial Intelligence
  • Email: ashvi.y.soni@gmail.com


Below are highlights of my technical skills:

  • 90%
  • 95%
  • 90%
  • 90%
    Mchine Learning
  • 90%
  • 85%
    ETL Pipeline
  • 85%
    Data Visualization Tools

My Resume can be downloaded here.

More of my credentials.

Here are my education and work experiences.


Master of Science in Computer Science

Aug 2019 - May 2021

The George Washington University

Receipient of GLobal Initiative Fellowship

Relevant Coursework: Machine Learning, Artificial Intelligence, Computer Vision, Intro to Statistical NLP, Intro to Big Data and Analytics, Database Management System for Data Analytics, Design and Analysis of Algorithms, Advanced Software Paradigms, Computer Sysyem Architecture

Bachelor of Engineering in Computer Engineering

Aug 2015 - May 2019

Gujarat Technological University

Receipient of Chief Minister Scholarship

Relevant Coursework: Artificial Intelligence, Data Structure, Analysis and Design of Algorithms, Calculus, Vector Calculus and Linear Algebra, Advanced Engineering Mathematics, Numerical and Statistical Methods for Computer Engineering, Theory of Computation, Computer Organization, Computer Networks, Operating System, Web Technology, Software Engineering, Python Programming

Professional Experience

Data Engineer

Jun 2022 - Present

eClinical Solutions

  • Perform Extract, Transform, and Load (ETL) processes to fetch clinical trial data from Electronic Data Capture (EDC) systems, vendor labs, and other relevant sources into company proprietary software using Amazon Web Services (AWS) and sFTP.
  • Perform data cleaning, transformation, and aggregation tasks by creating Python and SQL scripts, work with large data sets, and optimizing queries for performance.
  • Develop Python(pandas) and SQL scripts to detect anomalies, creat reconciliation reports between two datasets, identifying trends, generating custom reports to meet client-specific needs.
  • Develop multiple Python-based Machine Learning models utilizing pandas and sci-kit learn libraries to identify outliers in Lab Results and Vitals, along with Time Series Analysis for detecting atypical shifts. Present results through visualizations with matplotlib and seaborn libraries. Further, fine-tune these models through hyperparameter optimization.
  • Design and implement diverse Python Machine Learning models using pandas and sci-kit learn to identify inappropriate medication-indication pairs in study data. Employ the Score Matching technique for cross-referencing recorded indications with an open-source dataset and fine-tune these models through hyperparameter optimization.
  • Develop data visualizations using Qlik to transform complex clinical data into meaningful insights and analytics. This involved creating interactive dashboards, reports, and charts to communicate findings to stakeholders.
  • Develop and standardize Python and R scripts for dynamically generating visual profiles for individual patients, ensuring adaptability to any retrieved data and enhancing flexibility and efficiency in data presentation across multiple clinical trial studies.

Programmer and Data Analyst

Aug 2021 - Jun 2022

The Biostatistics Center, George Washington University

Project: COVID-19 Community Research Partnership
  • Worked on COVID-19 Community Research Partnership Project funded by the CDC(Centers for Disease Control and Prevention) and the State of North Carolina.
  • Developed an Extract, Transform, Load (ETL) pipeline for Electronic Health Records (EHR) data from 8 COVID-19 diagnosis sites, ensuring seamless data integration and accessibility.
  • Executed comprehensive data wrangling, cleaning, and organization using Python, R, and SQL scripts. Defined diagnosis conditions and their ICD-10-CM codes using HCUP databases to create analysis-ready data.
  • Conducted exploratory, descriptive, and predictive data analysis, leveraging Python and R, to extract meaningful insights and patterns from the dataset.
  • Created informative data visualizations to communicate findings effectively, facilitating better understanding and decision-making.
  • Implemented data mapping using Python, SQL, and R scripts to correlate medications from participants’ EHR datasets to their respective active ingredient attributes.
  • Defined development standards and documented the entire process workflow, standardizing the data wrangling process for future projects.
  • Demonstrated strong analytical skills by identifying a significant HIPAA violation within the data sets, ensuring compliance and confidentiality.
  • Organized and led meetings with 10 different research sites, creating agendas and distributing meeting minutes to ensure effective communication and collaboration.
  • Proactively sought opportunities for process improvement and implemented enhancements to optimize data analysis efficiency and accuracy.

Junior Data Scientist

Aug 2021 - Jun 2022

The Biostatistics Center, George Washington University

Project: EDIC
  • Worked on EDIC(Epidemiology of Diabetes Interventions and Complications) research study funded by NIDDK(National Institute of Diabetes and Digestive and Kidney Diseases).
  • Developed and implemented six distinct predictive models- Logistic Regression, Random Forest, SVM, K-Nearest Neighbor, XGBoost, Neural Network to forecast the occurrence of Hypoglycemia for participants with Type 1 diabetes.
  • Fine-tuned the predictive models, enhancing the sensitivity and specificity by 50%, thus improving the overall performance and accuracy of the models.
  • Worked collaboratively with cross-functional teams, including statisticians and researchers, to integrate data science findings into the broader context of the research study.
  • Generated detailed reports on model performance, key findings, and recommendations, facilitating informed decision-making by the research team.
  • Maintained comprehensive documentation of the data science workflow, ensuring transparency and reproducibility of analyses for future reference.
  • Stayed abreast of the latest advancements in data science and machine learning, implementing new methodologies and best practices to enhance the overall effectiveness of the predictive models.

Teaching Experience

Instructional Assistant

Oct 2020 - May 2022

Introduction to Python for Public Health Research, Introduction to R for Public Health Research

  • Served as a Graduate Teaching Assistant for two courses, Introduction to Python and Introduction to R.
  • Mentored a total of 130 students, offering individual guidance during office hours to address queries and provide support related to coursework.
  • Took initiative in autonomously managing the grading process for assignments, leading to a more efficient workflow.
  • Significantly improved the learning outcomes of 130 students by delivering timely and detailed feedback on homework assignments.
  • Proficiently managed the Learning Management System, Blackboard, ensuring seamless tracking of course activities and maintaining organized course materials.
  • Advised students individually, clarifying materials and addressing difficulties they encountered, contributing to a positive and supportive learning environment.

Check Out Some of My Works.

Some of my projects are here.


I'd Love To Hear From You.

Let me get to know more about you.

Your message was sent, thank you!