PS: Clustering whiskies

I played some more with the app from yesterday, and deployed it with a more usefull user interface and some new functionality on http://ojessen.shinyapps.io/whiskyTastingsApp/. By the way, thanks to the guys at rstudio.com for hosting the app on their servers.

 

Clustering whiskies by taste

Lately I was wondering how to integrate my wordpress blog with a shiny app. This post is an example for the collaboration between these platforms following the advice from this thread in the shiny google group. Looking for a good example for an app, I stumbled upon this article from Luba Gloukhov on the Revolution blog. I shamelessly copied most of the code from Luba. Obviously the layout of this blog is not ideal for the layout of the app, but I will work this out on another day.

You can find the code for the app on github. I also made a standalone-app with some more functions on http://ojessen.shinyapps.io/whiskyTastingsApp/

Analyse einer Barriere-Anleihe

Ich bin heute über ein interessantes Angebot der Landesbank Berlin gestolpert, eine Barriere-Anleihe mit Referenz auf Daimler-Benz. Dieses Produkt verspricht in der Laufzeit 7% p. a., die in jedem Fall gezahlt werden. Interessant ist die Höhe des Rückzahlungsbetrags die vom Basispreis der Aktie zum Zeitpunkt der Emission (=100%) und der Barriere von 70% des Basispreis abhängt: Die Anleihe wird zu 100% zurückbezahlt, wenn der Kurs immer über der Barriere lag, oder, wenn der Kurs zwischenzeitlich unterhalb der Barriere lag, zum Zeitpunkt der Abrechnung jedoch auf oder über dem Basispreis liegt. Hat der Kurs in der Laufzeit die Barriere unterschritten, und liegt der Kurs unterhalb das Basispreises, erhält der Gläubiger einen Ausgleich in Aktien der Daimler-Benz.

Die Laufzeit beträgt 2 Jahre vom 27.11.13-27.11.15, wobei die Bewertung am 20.11.2015 erfolgt.

Die Frage, die ich mir stelle ist, welchen erwarteten Wert hat dieses Geschäft. Aufgrund der Komplexität erscheint mir eine stochastische Simulation als geeignete Methode.

Zunächst definiere ich die Auszahlungsfunktion bei Laufzeitende

payoutEnd = function(basis, barriere, kurse)
{
  basis = as.numeric(basis)
  barriere = as.numeric(barriere)
  kurse = as.numeric(kurse)
  if(all(kurse >= barriere))
  {
    res = kurse[1]
  } else if( last(kurse)>basis)
  {
    res = kurse[1]   
  } else 
  {
    res = last(kurse)
  }
  return(res)
}

Die Daten ziehe ich aus finance.yahoo.com

## [1] "DAI.DE"

Beschreiben lässt sich der Kursverlauf der Daimler als jahrzehntelange Seitwärtsbewegung mit großen Schwankungen

chartSeries(DAI.DE)

plot of chunk unnamed-chunk-3

Für die Analyse verwende ich das Bootstrap-Verfahren. Das heisst, ich ziehe zufällig ein Startdatum, und ermittle, welchen Payout ich zwei Jahre später bekommen würde. Dabei vernachlässige ich einfachheitshalber, dass die Bewertung 1 Woche vor Laufzeitende erfolgt.

Hier der Algorithmus

funBootstrap = function(basisPerc, barrierePerc, kurseKomplett)
{
  # Bestimme Start- und Enddatumg
  startDate = last(index(kurseKomplett))
  lastPossibleDate = seq(startDate, by = "-2 years", length.out = 2)[2]
  while(startDate >=lastPossibleDate)
  {
    startDate = sample(index(kurseKomplett),1)  
  }

  endDate = seq(startDate, by = "2 years", length.out = 2)[2]
  kurse = as.numeric(kurseKomplett[paste0(startDate,"/",endDate)])
  res = payoutEnd(basisPerc,
                  barrierePerc, kurse/kurse[1])
  return(res)
}

Und hier die Ergebnisse für die Rückzahlung bei 10.000 Wiederholungen.

payouts = replicate(10000, funBootstrap(1,0.7, DAI.DE[,3]))

summary(payouts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3091  0.7225  1.0000  0.8538  1.0000  1.0000
hist(payouts)

plot of chunk unnamed-chunk-5

quantile(payouts,0.05)
##        5% 
## 0.4604145

Betrachte ich nur die Rückzahlung, würde ich zwar in mehr als der Hälfte der Fälle meine Einlage zurückerhalten, aber müsste im Mittelwert mit einer Rückzahlung in Höhe von nur 85.3803626% rechnen. In 5% der Fälle würde ich allerdings nur 46.0414543% zurückerhalten

Für die Anlageentscheidung aber ist der gesamte Cashflow in einer Barwertbetrachtung entscheidend. Im folgenden gehe ich von einer risikofreien Anlagealternative von 2% aus, was für 2 Jahre sicherlich schon optimistisch ist, insbesondere für Privatanleger.

riskfree = 0.02
dcf = 0.07/(1+riskfree)+0.07/(1+riskfree)^2+payouts
hist(dcf)

plot of chunk unnamed-chunk-6

summary(dcf)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4450  0.8584  1.1360  0.9897  1.1360  1.1360
quantile(dcf,0.05)
##        5% 
## 0.5963238

In dieser Betrachtung liegt der Mittelwert des Barwerts des Cashflows bei 98.9712892%, der Anleger müsste also trotz Kuponzahlung in Höhe von 7% mit einem Verlust von 1.0287108% rechnen.

Der Erwartungswert des Geschäfts auf Basis der vergangenen gut 13 Jahre ist also negativ. Sollte man daher von vornherein von der Anlage abraten? Dies hängt zum großen Teil von der Einschätzung der zukünftigen Marktlage ab: Wenn man der Meinung ist, dass Daimler in den nächsten 2 Jahren zumindest nicht weiter als 30% unter den aktuellen Kurs fällt, dann wäre dies sicherlich eine Möglichkeit, um einen vergleichsweise hohen Kupon zu erhalten.

Zumdem geht das Modell davon aus, dass sich der Kurs der Aktie prinzipiell so verhält wie in den letzten 13 Jahren. Diese Zeit war durch 2 große Einbrüche karakterisiert, in denen der Kurs um 2/3 bzw. ¾ zurückging. Teilt man die Einschätzung, dass die Zentralbanken für die nächsten zwei Jahre bei ihrer Politik des quantitative easing bleiben, dann ist für diese Zeit nicht mit einem massiven Einbruch der Märkte zu rechnen.

Payday lending – how low can they go?

A post from TMM referenced the British payday lender wonga. As this is a way of credit which is either uncommon or illegal in Germany, a word about their business model: They give out very small credits (upper limit 400 pound) for a short time, up to 30 days. For this they will charge 5.5 pounds in fees and roughly 1% interest per day, based on a yearly rate of 365% .

Nice business, if you can get it. What peeked my interest was the question: How much losses can they take on these loans and still make a decent return on investment? After fiddling around with the numbers I pulled from thin air, the result is: They can troll the bottom of the sea, and still make a living. Here is my reasoning:

I simulate a credit portfolio of 100,000 credits within a business year of 365 days, ignoring weekends and holidays. Loan size and duration are uniformly distributed within the limits set by wonga. I assign a probability of default of between 80% and 100%. I will define the time reference, in which this probability applies, in the next step. So this value can be interpreted as “the creditor will go bust within the next (x) days with (y) likelihood”. I further assume an equity of 1 mil pounds to start the business. This ensures that the lender himself does not get in the red.

numYears =1
numCost = 100000*numYears
numDays = 365*numYears
equity = 1e6

credits = data.frame(id = 1:numCost,
                     loan = round(runif(numCost, min=1, max =400),0),
                     duration = as.integer(runif(numCost, min=1, max = 30)),
                     loanday = as.integer(runif(numCost,min=1,max=numDays)),
                     pd = runif(numCost, min = 0.8, max = 1))


credits$repayday = credits$loanday + credits$duration

I calculate the value of the outstanding amount on the repayment day as the size of the loan + 1% interest per day of duration. I assume that the fixed amount of 5.50 pounds coveres the fixed costs of the credit. To calculate the actual repayment made to the lender I make a random draw to simulate the possibility of default, where the base probability of default is wheighted with the duration of the credit in reference the time reference chosen for the probability of default. Usually this is one year, but I will vary it in the analysis.

In the next step I calculate the cashflow and the balance sheet during the business year, and calculate the return on investment as the value of the balance sheet at the end of the year, divided by the equity.

calcRoi = function(credits, defaultDenum=365)
{
  credits$repayment = (credits$loan* (1+credits$duration/100))* 
    rbinom(numCost, 1, (1-credits$pd*credits$duration/defaultDenum)) 

  cashflow = data.frame(day = 1:numDays, outflow = 0, inflow = 0, balance = 0)
  cashflow$balance[1]= equity
  for(day in 1:numDays)
  {
    cashflow$outflow[day] = sum(credits$loan[which(credits$loanday == day)])
    cashflow$inflow[day] = sum(credits$repayment[which(credits$repayday == day)])
    if(day>1)
    {
      cashflow$balance[day]=cashflow$balance[day-1]+cashflow$inflow[day]-cashflow$outflow[day]
    }
  }
  (ROI = cashflow$balance[nrow(cashflow)]/equity)
}

I now calculate the ROI for different time references in the range from 1 month to 1 year and plot the result
plot of chunk unnamed-chunk-3
The red line is the break-even and the blue line marks 10% ROI.

As you can see, the payday lender will make a decent ROI of 10% even with an average probability of default of 90% within the next 3-4 months, even if he writes off the defaulted loans completely. The crucial part of the business thus will be walking the fine line of selecting creditors who are nearly busted, so they are desperate enough to apply to this loan, but not yet busted, so that the lender will lose on a too large part of the credit portfolio.

Re: Data Paranoia – maybe justified this time?

Re: Data Paranoia – maybe justified this time?

Menzie Chinn responded to a post on Zero Hedge in which someone claimed that the september spread in the gain of full-time jobs and the loss in part-time jobs would be unusually high. Menzie responded that this would be well within the probability given the history of these time series. Thankfully he gave links to the data on the FRED database. As I was wasting the time anyway, I thought I would take a look at the data.

## [1] "LNS12600000" "LNS12500000"

plot of chunk data

So, we have a set of 549 montly data from January 1968 to September 2013. As in Menzie’s piece, i take the first difference of the logs to get monthly rates of change.

dfDL = as.data.frame(apply(df, 2, function(x)diff(log(x))))
names(dfDL) = c("FullTime", "PartTime")
summary(dfDL)
##     FullTime             PartTime        
##  Min.   :-0.0211697   Min.   :-0.032792  
##  1st Qu.:-0.0006873   1st Qu.:-0.005594  
##  Median : 0.0013157   Median : 0.001420  
##  Mean   : 0.0011179   Mean   : 0.001760  
##  3rd Qu.: 0.0032261   3rd Qu.: 0.007903  
##  Max.   : 0.0148232   Max.   : 0.108558

We define the september 2013 values as cutfff-points, and get the data points with an even more extreme spread between full-time and part-time.

# In case someone would try to use the code at a later time. 
sep13 = which(index(LNS12500000)=="2013-09-01")-1
cutOffs = dfDL[sep13,]
lowerRight = dfDL[which(dfDL$FullTime > cutOffs$FullTime & 
                          dfDL$PartTime< cutOffs$PartTime),]

Looking at the scatter-plot, we could expect a negative correlation between the two variables

plot of chunk unnamed-chunk-3

And this is confirmed, the correlation is -0.3918705. So, on first glance it would not seem to be unlikely that there should be a large increase of full-time jobs and a large decrease in part-time jobs. In fact, it would be the expected result that, given a high value for one series, we have high value for the other series with the opposite sign.

Identifying the empirical likelyhood of the event, we see that the red dot is September 2013, blue ones are larger spreads – 5 points, which gives us a frequency for this event of 1.0928962%. Next step is to see, if the results is statistically more unlikely than what we have observed.

We assume a bivariate normal distribution with observed mean values and covariance matrix:

mu = colMeans(dfDL)
sigma = cov(dfDL)
dfSum = data.frame(Mean = mu, StDev = apply(dfDL,2,sd))
dfSum
##                 Mean       StDev
## FullTime 0.001117893 0.003495736
## PartTime 0.001760472 0.011738748

As we can see, the mean rate of the part-time series is about 80% higher than the full-time series. It also has a much larger standard deviation.

To calculate the probability of the joint event that the full-time-rate in a given month is geq 0.3702089% and that the part-time-rate in a given month is leq -1.1011538% we put the numbers into R:

probEvent = pmvnorm(upper =unlist(c(Inf,cutOffs[2])),lower =unlist(c(cutOffs[1],-Inf)),mean = mu, sigma = sigma)

And the result is 6.2379699% – or about once very 1.3359047 years, which is in the same ballpark-range as the observed frequency of 1.0928962% or once every 7.625 years.

We contrast it with the outlier on the other side of the distribution, which looks like a rebasement in Januar 1994, as we have a decrease of -2078.83973288225 in the number of full-time jobs and an increase of 2576.9469075262 in the number of part-time-jobs.

we get the following results:

whichPT = which.max(dfDL$PartTime)
cutOffs2 = dfDL[whichPT,]
cutOffs2
##               FullTime  PartTime
## 1994-01-01 -0.02116966 0.1085579
probEvent2 = pmvnorm(lower =unlist(c(-Inf,cutOffs2[2])),upper =unlist(c(cutOffs2[1],Inf)),mean = mu, sigma = sigma)

And the result is 6.1032592 × 10-21% – which is a “not in the lifetime of the universe”-kind of likelihood.

So, the answer to the leading question – is the paranoia justified this time – is no. This month’s development is not statistically unlikely, especially in contrast to a genuine man-made event.