Payday lending – how low can they go?

A post from TMM referenced the British payday lender wonga. As this is a way of credit which is either uncommon or illegal in Germany, a word about their business model: They give out very small credits (upper limit 400 pound) for a short time, up to 30 days. For this they will charge 5.5 pounds in fees and roughly 1% interest per day, based on a yearly rate of 365% .

Nice business, if you can get it. What peeked my interest was the question: How much losses can they take on these loans and still make a decent return on investment? After fiddling around with the numbers I pulled from thin air, the result is: They can troll the bottom of the sea, and still make a living. Here is my reasoning:

I simulate a credit portfolio of 100,000 credits within a business year of 365 days, ignoring weekends and holidays. Loan size and duration are uniformly distributed within the limits set by wonga. I assign a probability of default of between 80% and 100%. I will define the time reference, in which this probability applies, in the next step. So this value can be interpreted as “the creditor will go bust within the next \(x\) days with \(y\) likelihood”. I further assume an equity of 1 mil pounds to start the business. This ensures that the lender himself does not get in the red.

numYears = 1
numCost = 1e+05 * numYears
numDays = 365 * numYears
equity = 1e+06

credits = data.frame(id = 1:numCost, loan = round(runif(numCost, min = 1, max = 400), 
    0), duration = as.integer(runif(numCost, min = 1, max = 30)), loanday = as.integer(runif(numCost, 
    min = 1, max = numDays)), pd = runif(numCost, min = 0.8, max = 1))

credits$repayday = credits$loanday + credits$duration

I calculate the value of the outstanding amount on the repayment day as the size of the loan + 1% interest per day of duration. I assume that the fixed amount of 5.50 pounds coveres the fixed costs of the credit. To calculate the actual repayment made to the lender I make a random draw to simulate the possibility of default, where the base probability of default is wheighted with the duration of the credit in reference the time reference chosen for the probability of default. Usually this is one year, but I will vary it in the analysis.

In the next step I calculate the cashflow and the balance sheet during the business year, and calculate the return on investment as the value of the balance sheet at the end of the year, divided by the equity.

calcRoi = function(credits, defaultDenum = 365) {
    credits$repayment = (credits$loan * (1 + credits$duration/100)) * rbinom(numCost, 
        1, (1 - credits$pd * credits$duration/defaultDenum))

    cashflow = data.frame(day = 1:numDays, outflow = 0, inflow = 0, balance = 0)
    cashflow$balance[1] = equity
    for (day in 1:numDays) {
        cashflow$outflow[day] = sum(credits$loan[which(credits$loanday == day)])
        cashflow$inflow[day] = sum(credits$repayment[which(credits$repayday == 
        if (day > 1) {
            cashflow$balance[day] = cashflow$balance[day - 1] + cashflow$inflow[day] - 
    (ROI = cashflow$balance[nrow(cashflow)]/equity)

I now calculate the ROI for different time references in the range from 1 month to 1 year and plot the result
plot of chunk unnamed-chunk-3

The red line is the break-even and the blue line marks 10% ROI.

As you can see, the payday lender will make a decent ROI of 10% even with an average probability of default of 90% within the next 3-4 months, even if he writes off the defaulted loans completely. The crucial part of the business thus will be walking the fine line of selecting creditors who are nearly busted, so they are desperate enough to apply to this loan, but not yet busted, so that the lender will lose on a too large part of the credit portfolio.

Veröffentlicht unter Allgemein | Verschlagwortet mit , , , | Kommentare deaktiviert

Re: Data Paranoia – maybe justified this time?

Re: Data Paranoia – maybe justified this time?

Menzie Chinn responded to a post on Zero Hedge in which someone claimed that the september spread in the gain of full-time jobs and the loss in part-time jobs would be unusually high. Menzie responded that this would be well within the probability given the history of these time series. Thankfully he gave links to the data on the FRED database. As I was wasting the time anyway, I thought I would take a look at the data.

## [1] "LNS12600000" "LNS12500000"

plot of chunk data

So, we have a set of 549 montly data from January 1968 to September 2013. As in Menzie's piece, i take the first difference of the logs to get monthly rates of change.

dfDL =, 2, function(x) diff(log(x))))
names(dfDL) = c("FullTime", "PartTime")
##     FullTime           PartTime       
##  Min.   :-0.02117   Min.   :-0.03279  
##  1st Qu.:-0.00073   1st Qu.:-0.00548  
##  Median : 0.00126   Median : 0.00149  
##  Mean   : 0.00108   Mean   : 0.00182  
##  3rd Qu.: 0.00315   3rd Qu.: 0.00792  
##  Max.   : 0.01482   Max.   : 0.10856

We define the september 2013 values as cutfff-points, and get the data points with an even more extreme spread between full-time and part-time.

# In case someone would try to use the code at a later time.
sep13 = which(index(LNS12500000) == "2013-09-01") - 1
cutOffs = dfDL[sep13, ]
lowerRight = dfDL[which(dfDL$FullTime > cutOffs$FullTime & dfDL$PartTime < cutOffs$PartTime), 

Looking at the scatter-plot, we could expect a negative correlation between the two variables

plot of chunk unnamed-chunk-3

And this is confirmed, the correlation is -0.3922. So, on first glance it would not seem to be unlikely that there should be a large increase of full-time jobs and a large decrease in part-time jobs. In fact, it would be the expected result that, given a high value for one series, we have high value for the other series with the opposite sign.

Identifying the empirical likelyhood of the event, we see that the red dot is September 2013, blue ones are larger spreads – 5 points, which gives us a frequency for this event of 1.0929%. Next step is to see, if the results is statistically more unlikely than what we have observed.

We assume a bivariate normal distribution with observed mean values and covariance matrix:

mu = colMeans(dfDL)
sigma = cov(dfDL)
dfSum = data.frame(Mean = mu, StDev = apply(dfDL, 2, sd))
##              Mean    StDev
## FullTime 0.001081 0.003537
## PartTime 0.001824 0.011828

As we can see, the mean rate of the part-time series is about 80% higher than the full-time series. It also has a much larger standard deviation.

To calculate the probability of the joint event that the full-time-rate in a given month is \(\geq\) 0.5929% and that the part-time-rate in a given month is \(\leq\) -2.1443% we put the numbers into R:

probEvent = pmvnorm(upper = unlist(c(Inf, cutOffs[2])), lower = unlist(c(cutOffs[1], 
    -Inf)), mean = mu, sigma = sigma)

And the result is 0.7707% – or about once very 10.8128 years, which is in the same ballpark-range as the observed frequency of 1.0929% or once every 7.625 years.

We contrast it with the outlier on the other side of the distribution, which looks like a rebasement in Januar 1994, as we have a decrease of -2078.8397 in the number of full-time jobs and an increase of 2576.9469 in the number of part-time-jobs.

we get the following results:

whichPT = which.max(dfDL$PartTime)
cutOffs2 = dfDL[whichPT, ]
##            FullTime PartTime
## 1994-01-01 -0.02117   0.1086
probEvent2 = pmvnorm(lower = unlist(c(-Inf, cutOffs2[2])), upper = unlist(c(cutOffs2[1], 
    Inf)), mean = mu, sigma = sigma)

And the result is 1.4809 × 10-20% – which is a “not in the lifetime of the universe”-kind of likelihood.

So, the answer to the leading question – is the paranoia justified this time – is no. This month's development is not statistically unlikely, especially in contrast to a genuine man-made event.

Veröffentlicht unter Allgemein | Verschlagwortet mit , | Kommentare deaktiviert

On the negative basis trade

Repost: On negative basis trade

This is a repost of something i wrote in May 2010. The reason for the repost is that I want to use the power of Markdown to make it look “nice”.

On FT Alphaville there was a discussion on the persistence of a gap between the price of CDS on Greek Government Bonds (GGB) and the spotmarket price of GGB: As noted by Barclay Capital, the GGB spreads have widened relative to CDS premiums, allowing an arbitrage trade without default risk from the issuer of the bond.[1]
One possible explanation for the persistence of the gap between CDS premia and spotmarket may be the counterparty risk. First, approximate the probability of default of GGB by the spread:

\[\Pr(D_{GGB})=r_{GGB} - r_f\]

The value of the CDS from the perspective of a buyer is – for some value of recovery rate \(R\), assuming no counterparty risk:

\[CDS = (1-R)\cdot \Pr(D_{GGB})\]

If we allow for counterparty risk \[\Pr(D_{CP})\] this becomes:

\[CDS = (1-R)\cdot \Pr(D_{GGB}) \cdot \Pr(1-D_{CP})\]

Now, if we allow for the weak form of the efficient market hypothesis, a widening between GGB on the spotmarket and CDS-premia could have two sources: a rise in the expected recovery rate, or a rise in the perceived counterparty risk. Since S&P has set a low expectancy value on the recovery rate (between 30-50%, far below the average recovery rate for sovereign defaults), this persistent, and widening gap is only consistent with a widening in the counterparty risk.

[1]: One could buy the bond and buy default protection – a CDS – and cash in the difference between the bond interest rate and the insurence premium.

Veröffentlicht unter Theorie | Verschlagwortet mit , , | Kommentare deaktiviert

Noch ein paar Spielereien

Noch ein paar Spielereien

Es funktioniert also. Mal sehen, was noch so alles geht.

\[ a^2 + b^2 = c^2 \]

\[ \frac{\sqrt{a+b}}{1-(a+b)} \]

Mal sehen, wie die Tabellen rauskommen


pander(ks.test(rnorm(20, 1, 1), rnorm(20, 0, 2)))
Test statisticP valueAlternative hypothesis
0.450.03354 *two-sided

Table: Two-sample Kolmogorov-Smirnov test: rnorm(20, 1, 1) and rnorm(20, 0, 2)

pander(ks.test(runif(300), rnorm(30)))
Test statisticP valueAlternative hypothesis
0.39670.0002404 * * *two-sided

Table: Two-sample Kolmogorov-Smirnov test: runif(300) and rnorm(30)

 CoordinatesDensity values
1st Qu.-1.460.0217
3rd Qu.2.220.254

Table: Kernel density of rnorm(100) (bandwidth: 0.3356)

Eigenwerbung: Github

Ich war heute morgen so begeistert über die Fähgikeit von knitr, pandoc, markdown usw., dass ich mal ein Beispiel für deren Vielseitigkeit auf github hochgeladen habe. Diese Pakete erlauben, dass man mit einer Vorlage sowohl ein Word-Dokument, ein in LaTeX gesetztes PDF und eine HTML-Seite bekommt, ohne dass man optische Einschränkungen machen muss. Ein hervorragendes Beispiel für die Trennung von Model und View.

Veröffentlicht unter Allgemein | Verschlagwortet mit , | Kommentare deaktiviert

Nach langer Pause mal wieder aktiv

Nach langer Pause mal wieder aktiv

Nach drei Jahren mal wieder ein neuer Post. Eigentlich nur, um auszuprobieren, ob ich aus R heraus posten kann, und wie dass dann aussieht.

##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

You can also embed plots, for example:


plot of chunk unnamed-chunk-2

Veröffentlicht unter Allgemein | Kommentare deaktiviert