ANALYSIS AND CHURN PREDICTION OF THE PROJECT MANAGEMENT SYSTEM
Abstract
Based on the data of the project management system planfix.com, we predict customers' churn on the end of the free trial period. Predictive lead scoring (the process of predicting which trial users are going to convert into paying customers) is a common task of machine learning, which is solved by many companies based on the analysis of their own data, which prevents the usage of general purpose software for this task. We describe the algorithm for preprocessing data from the MySQL database to obtain a dataframe, which is a "tidy data", which is convenient for processing by machine learning methods, and does not contain unnecessary information and missing values. We made the selection of quantitative and binary features, that significantly influence the target variable using statistical criteria for testing multiple hypotheses. To solve this problem, we use classification algorithms, such as logistic regression, decision trees, random forest. We show that each of these algorithms copes well with the task. The most significant features were selected via the decision tree; these features were later used as parameters for more complex models. The random forest model was the most accurate on the task of classifying customers by the target attribute. The logistic regression usage made possible to calculate the probability of converting into paying customer based on customers' usage of various additional services of the system. We make a comparison of the obtained models. We show the characteristics of the customer's account that most affect the chances of the customer switching to a paid version after the end of free trial period. We give recommendations on the continuation of research, including the selection of the most effective form of a random forest model to facilitate the usage of predictive analysis of customers in the software product. Preprocessing and model building is done using the R programming language.
