Skip to content Skip to sidebar Skip to footer

Widget HTML #1

I will do ai ml data science and data analysis tasks python


I will do ai ml data science and data analysis tasks python

Python is a popular programming language for data science, machine learning (ML), and artificial intelligence (AI) tasks1. You can use Python libraries such as Pandas, Numpy and Beautiful Soup to perform tasks such as data collection and web scraping with APIs2.

Get  ai ml data science and data analysis tasks python

There are many Python projects for data science that you can work on in 20233. You can also automate exploratory data analysis with Python libraries such as Pandas Profiling, Sweetviz, and Autoviz4.

Why Practice Python for Data Science Project Ideas?

Python has come to command a celebrity status in data science over the years. It is loved by all data enthusiasts and provides an easy introduction to data science and machine learning. It’s easy to write and offers plenty of built-in libraries for complicated data science tasks. Python also owes its place among the favourites to easy code readability. Its syntax is skeletal and minimal when juxtaposed with other heavy-weight languages. Following is a non-exhaustive list of libraries available to use in Python for Data Science - seaborn, matplotlib, sci-kit learn, NumPy, SciPy, requests, pandas, regex etc. Aptly so, Python is a fine choice for beginners to get started learning data science. The best way to learn any technology or a programming language is to learn by doing. Here is a curated list of python data science projects to help you get started on your learning journey and gain the hands-on experience needed for a data science job.

The dataset contains the metadata for users and songs. Metadata includes user-specific and song-specific data, like user_ID, user_Registration_date, song_ID, song_genre, song_ArtistName, song_releaseDate etc. The dataset contains the time when a song is played for the first time by a user. This information is unique for each song-user pair.

There are three files in the dataset :

  • Train.csv - It stores user-song pair related data like use_id, source_system_tab, source_type, source_screentime, target. 

Target defines if the user listened to the same track within the bracket of one month. 

  • Target = 1 means the user repeated the song in 30 days
  • Target = 0 means the user did not repeat the song
  • Songs.csv - It contains data on songs like song_id, song_genre, song_artist, song_lyricist, etc.
  • members .csv - It contains user account data like user_name, user_age, user_gender, user_subscription_plan, etc

Music Recommendation data science project with python

Data Cleaning

The dataset can have anomalies, outliers and missing values. Such cases can interfere with the efficiency and accuracy of the algorithmic implementations. We need to normalise the data and make it uniform throughout. On average, about 20-40% of values in a dataset are outliers or missing.

We use the following techniques to clean the data

1. Outlier Detection and Treatment 

Outliers are absurd values that don’t fall under the permissible range for a label. For Example, a user’s age below 0 and above 100 can be considered absurd. It could be more stringent for some cases, like for purchasing liquor - between 18 and 100.

2. Imputing Missing Values 

Imputing is replacing the missing values in the dataset with another value.

We categorise user-song pairs under two prominent labels, i.e. repeat and non-repeats. 

  • Replacing missing values with appropriate data - The missing values in the dataset are replaced with either the mode or the median of the values.
  • Removing all null values - This case removes all the data points with missing data, resulting in data loss. After this procedure, the dataset file effectively reduces in size.
  • Making a new label as Missing - A new category called ‘missing’ is created for data points that have some value missing. It segregates missing resources under one single group.

And lastly, convert string labels into numerical counterparts. 

Libraries

pandas, sklearn, NumPy

This project will evaluate the following four modelling approaches to build a music recommender system -

  • Logistic regression

Logistic regression is the simplest of all the algorithms. It resides in python as a linear model in the sklearn library. 

  • Decision Tree 

The decision tree makes use of the tree structure to reach conclusions or results. At each level, there is a choice to follow either of the branches. Upon all the iterations, the tree outputs the result. 

  • Random Forest

A Random Forest is a collection of Decision Trees. 

Source Code with Guided Videos -  Music Recommender System using KKBox

basic : $50

bug fixing, data loading, basic plots, project consultation.

standard : $100

data analysis, preprocessing, visualization, machine learning models

advance : $200

advance data preprocessing, data analysis, feature extraction, model tuning and optimization

Get Fiverr Free Coupon Discount