Back to Portfolio
View on GitHub

Data-Driven Investment Strategies for Peer-to-Peer Lending

Machine learning approach to loan portfolio optimization

This project develops an intelligent investment strategy for peer-to-peer lending platforms, analyzing 230,000+ historical loans to achieve 28% returns at 8.4% risk through predictive modeling and portfolio optimization.

28% Expected Returns
8.4% Portfolio Risk
230K+ Loans Analyzed
4x Better Performance

Project Overview

Technical Approach

Data Science & Analytics

  • Comprehensive analysis of 230,000+ loan records
  • Feature engineering with derived return metrics
  • K-means clustering for borrower segmentation
  • Principal Component Analysis for insights
  • Statistical validation and hypothesis testing

Machine Learning

  • Classification models for default prediction
  • Regression analysis for return forecasting
  • Cross-validation and model performance optimization
  • Class imbalance handling techniques
  • Feature selection and importance analysis

Optimization & Finance

  • Portfolio optimization with integer programming
  • Constraint programming with budget limits
  • Cluster-based risk assessment methodology
  • Markowitz-type risk-return models
  • Sensitivity analysis and parameter tuning

Methodology & Results

Three-Phase Implementation

Phase 1: Filtering & Analysis

  • Removed high-risk loans by identifying default-prone borrowers
  • Comprehensive data cleaning and validation
  • Exploratory analysis and pattern discovery
  • Cluster analysis to understand borrower segments

Phase 2: Predictive Modeling

  • Trained models on historical loan performance data
  • Default probability prediction using classification
  • Return prediction with regularized regression
  • Model validation and performance benchmarking

Phase 3: Portfolio Optimization

  • Cluster-based risk assessment framework
  • Integer programming for optimal loan selection
  • Constraint optimization with diversification rules
  • Strategy backtesting and sensitivity analysis

Risk Assessment Innovation: Implemented cluster-based risk measurement where loans are grouped by similar borrowers, and risk is calculated as the standard deviation of predicted returns within each cluster, achieving controlled portfolio risk of 8.4%.

Key Findings & Performance

Optimization Success: The model-based strategy achieved 28.5% returns while maintaining superior risk-return balance (ratio: 3.4) compared to simple selection strategies. Significantly outperformed random selection which achieved only 11.5% returns.

Strategy Performance Comparison

28.5% Optimized Portfolio Return
8.4% Portfolio Risk
3.4 Risk-Return Ratio
11.5% Random Selection Return

Investment Strategy Insights

  • Grade B & C Focus: Optimal balance of risk and return, avoiding high-risk grades while maintaining strong performance
  • Predicted Returns 24-31%: All selected loans show strong return potential based on historical data analysis
  • Low-Risk Clusters: Loans selected from borrower clusters with stable return patterns and controlled variability
  • Diversification Strategy: Smaller loan amounts distributed across multiple borrowers for risk mitigation
  • Scalable Framework: Strategy remains effective across different portfolio sizes and budget constraints

Tools & Resources

Complete Project Resources

Technologies Used

  • Python for data analysis and modeling
  • Jupyter Notebooks for interactive development
  • scikit-learn for machine learning algorithms
  • pandas & numpy for data manipulation
  • Gurobi/CPLEX for optimization
  • matplotlib & seaborn for visualization

Available Resources

  • Complete source code and documentation
  • Jupyter notebooks with detailed analysis
  • Processed dataset ready for analysis
  • Model evaluation and results
  • Portfolio optimization implementation
  • Comprehensive README with setup instructions

Repository Highlights: The GitHub repository contains complete Jupyter notebooks for each phase of the analysis, documented Python modules for reproducible results, and detailed explanations of the methodology. The dataset includes 230,000+ loan records from LendingClub with comprehensive borrower information.

Project Impact & Applications

Key Deliverables

  • Complete end-to-end data science pipeline with 230,000+ loan analysis
  • Validated predictive models achieving superior risk-return ratios
  • Portfolio optimization engine using integer programming
  • Comprehensive strategy comparison framework
  • Actionable investment recommendations with 28% expected returns

Business Applications

  • Scalable framework for individual and institutional investors
  • Automated loan selection reducing manual analysis time
  • Risk management methodology applicable to various lending platforms
  • Data-driven approach superior to traditional investment methods
  • Transferable optimization techniques for portfolio management