SALM

Omansa

Volatility Risk Premium Strategy Algorithm

 

This post will be about comparing strategies from the paper “Easy Volatility Investing”, along with a demonstration of R’s table.Drawdowns command.

First off, before going further, while I think the execution assumptions found in EVI don’t lend the strategies well to actual live trading (although their risk/reward tradeoffs also leave a lot of room for improvement), I think these strategies are great as benchmarks.

So, some time ago, I did an out-of-sample test for one of the strategies found in EVI, which can be found here.

Using the same source of data, I also obtained data for SPY (though, again, AlphaVantage can also provide this service for free for those that don’t use Quandl).

 

So, an explanation: there are four return streams here–buy and hold XIV, the DDN momentum from a previous post, and two other strategies.

The simpler one, called the VRatio is simply the ratio of the VIX over the VXV. Near the close, check this quantity. If this is less than one, buy XIV, otherwise, buy VXX.

The other one, called the Volatility Risk Premium strategy (or VRP for short), compares the 10 day historical volatility (that is, the annualized running ten day standard deviation) of the S&P 500, subtracts it from the VIX, and takes a 5 day moving average of that. Near the close, when that’s above zero (that is, VIX is higher than historical volatility), go long XIV, otherwise, go long VXX.

Again, all of these strategies are effectively “observe near/at the close, buy at the close”, so are useful for demonstration purposes, though not for implementation purposes on any large account without incurring market impact.

Here are the results, since 2011 (that is, around the time of XIV’s actual inception):

 

To note, both the momentum and the VRP strategy underperform buying and holding XIV since 2011. The VRatio strategy, on the other hand, does outperform.

Here’s a summary statistics function that compiles some top-level performance metrics.

 

Note that the table.Drawdowns command only examines one return stream at a time. Furthermore, the top argument specifies how many drawdowns to look at, sorted by greatest drawdown first.

One reason I think that these strategies seem to suffer the drawdowns they do is that they’re either all-in on one asset, or its exact opposite, with no room for error.

One last thing, for the curious, here is the comparison with my strategy since 2011 (essentially XIV inception) benchmarked against the strategies in EVI (which I have been trading with live capital since September, and have recently opened a subscription service for):

Link: https://www.r-bloggers.com/comparing-some-strategies-from-easy-volatility-investing-and-the-table-drawdowns-command/

 

 

### Source ###

require(downloader)
require(quantmod)
require(PerformanceAnalytics)
require(TTR)
require(Quandl)
require(data.table)

download(“http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vix3mdailyprices.csv”, destfile=”vxvData.csv”)

VIX <- fread(“http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv”, skip = 1)
VIXdates <- VIX$Date
VIX$Date <- NULL; VIX <- xts(VIX, order.by=as.Date(VIXdates, format = ‘%m/%d/%Y’))

vxv <- xts(read.zoo(“vxvData.csv”, header=TRUE, sep=”,”, format=”%m/%d/%Y”, skip=2))

ma_vRatio <- SMA(Cl(VIX)/Cl(vxv), 10)
xivSigVratio <- ma_vRatio < 1
vxxSigVratio <- ma_vRatio > 1

# V-ratio (VXV/VXMT)
vRatio <- lag(xivSigVratio) * xivRets + lag(vxxSigVratio) * vxxRets
# vRatio <- lag(xivSigVratio, 2) * xivRets + lag(vxxSigVratio, 2) * vxxRets

# Volatility Risk Premium Strategy
spy <- Quandl(“EOD/SPY”, start_date=’1990-01-01′, type = ‘xts’)
spyRets <- Return.calculate(spy$Adj_Close)
histVol <- runSD(spyRets, n = 10, sample = FALSE) * sqrt(252) * 100
vixDiff <- Cl(VIX) – histVol
maVixDiff <- SMA(vixDiff, 5)

vrpXivSig <- maVixDiff > 0
vrpVxxSig <- maVixDiff < 0
vrpRets <- lag(vrpXivSig, 1) * xivRets + lag(vrpVxxSig, 1) * vxxRets

obsCloseMomentum <- magicThinking # from previous post

compare <- na.omit(cbind(xivRets, obsCloseMomentum, vRatio, vrpRets))
colnames(compare) <- c(“BH_XIV”, “DDN_Momentum”, “DDN_VRatio”, “DDN_VRP”)

stratStats <- function(rets) {
stats <- rbind(table.AnnualizedReturns(rets), maxDrawdown(rets))
stats[5,] <- stats[1,]/stats[4,] stats[6,] <- stats[1,]/UlcerIndex(rets)
rownames(stats)[4] <- “Worst Drawdown”
rownames(stats)[5] <- “Calmar Ratio”
rownames(stats)[6] <- “Ulcer Performance Index”
return(stats)
}

> stratStats(compare[‘2011::’])
BH_XIV DDN_Momentum DDN_VRatio DDN_VRP
Annualized Return 0.3801000 0.2837000 0.4539000 0.2572000
Annualized Std Dev 0.6323000 0.5706000 0.6328000 0.6326000
Annualized Sharpe (Rf=0%) 0.6012000 0.4973000 0.7172000 0.4066000
Worst Drawdown 0.7438706 0.6927479 0.7665093 0.7174481
Calmar Ratio 0.5109759 0.4095285 0.5921650 0.3584929
Ulcer Performance Index 1.1352168 1.2076995 1.5291637 0.7555808

> table.Drawdowns(compare[,1][‘2011::’], top = 5)
From Trough To Depth Length To Trough Recovery
1 2011-07-08 2011-11-25 2012-11-26 -0.7439 349 99 250
2 2015-06-24 2016-02-11 2016-12-21 -0.6783 379 161 218
3 2014-07-07 2015-01-30 2015-06-11 -0.4718 236 145 91
4 2011-02-15 2011-03-16 2011-04-20 -0.3013 46 21 25
5 2013-04-15 2013-06-24 2013-07-22 -0.2877 69 50 19
> table.Drawdowns(compare[,2][‘2011::’], top = 5)
From Trough To Depth Length To Trough Recovery
1 2014-07-07 2016-06-27 2017-03-13 -0.6927 677 499 178
2 2012-03-27 2012-06-13 2012-09-13 -0.4321 119 55 64
3 2011-10-04 2011-10-28 2012-03-21 -0.3621 117 19 98
4 2011-02-15 2011-03-16 2011-04-21 -0.3013 47 21 26
5 2011-06-01 2011-08-04 2011-08-18 -0.2723 56 46 10
> table.Drawdowns(compare[,3][‘2011::’], top = 5)
From Trough To Depth Length To Trough Recovery
1 2014-01-23 2016-02-11 2017-02-14 -0.7665 772 518 254
2 2011-09-13 2011-11-25 2012-03-21 -0.5566 132 53 79
3 2012-03-27 2012-06-01 2012-07-19 -0.3900 80 47 33
4 2011-02-15 2011-03-16 2011-04-20 -0.3013 46 21 25
5 2013-04-15 2013-06-24 2013-07-22 -0.2877 69 50 19
> table.Drawdowns(compare[,4][‘2011::’], top = 5)
From Trough To Depth Length To Trough Recovery
1 2015-06-24 2016-02-11 2017-10-11 -0.7174 581 161 420
2 2011-07-08 2011-10-03 2012-02-03 -0.6259 146 61 85
3 2014-07-07 2014-12-16 2015-05-21 -0.4818 222 115 107
4 2013-02-20 2013-07-08 2014-06-10 -0.4108 329 96 233
5 2012-03-27 2012-06-01 2012-07-17 -0.3900 78 47 31

stratStats(compare[‘2011::’])
QST_vol BH_XIV DDN_Momentum DDN_VRatio DDN_VRP
Annualized Return 0.8133000 0.3801000 0.2837000 0.4539000 0.2572000
Annualized Std Dev 0.3530000 0.6323000 0.5706000 0.6328000 0.6326000
Annualized Sharpe (Rf=0%) 2.3040000 0.6012000 0.4973000 0.7172000 0.4066000
Worst Drawdown 0.2480087 0.7438706 0.6927479 0.7665093 0.7174481
Calmar Ratio 3.2793211 0.5109759 0.4095285 0.5921650 0.3584929
Ulcer Performance Index 10.4220721 1.1352168 1.2076995 1.5291637 0.7555808

 

 

A/B Test in R

 

Definition:  A/B testing is a method for comparing the effectiveness of several different variations of a web page. A/B Testing is a simple form of hypothesis testing with one control group and one treatment group.

Usage:

Formula:

Example:

  • An online clothing retailer that specializes in mens’ streetwear may want to examine whether a black or pink background results in more purchases from visitors to the site. After running the experiment for one week, we find that the pink background resulted in 40% purchase rate with 500 visitors while the black background resulted in a 30% purchase rate with 550 visitors. So which background is more effective at generating purchases from visitors to the online store. One way to examine this problem is by calculating confidence intervals of the conversion rates for each variation of the site.
  • In the following R code, I construct a function which calculates the confidence intervals for the purchase rate of each site at a 80% significance level. In this example, the purchase rate for the pink background is significantly higher than the purchase rate for the black background.

Source:

site1 = c(.40, 500) # pink
site2 = c(.30, 550) # black

abtestfunc <- function(ad1, ad2){
sterror1 = sqrt( ad1[1] * (1-ad1[1]) / ad1[2] )
sterror2 = sqrt( ad2[1] * (1-ad2[1]) / ad2[2] )
minmax1 = c((ad1[1] – 1.28*sterror1) * 100, (ad1[1] + 1.28*sterror1) * 100)
minmax2 = c((ad2[1] – 1.28*sterror2) * 100, (ad2[1] + 1.28*sterror2) * 100)
print( round(minmax1,2) )
print( round(minmax2,2) )
}

abtestfunc(site1, site2)

> abtestfunc(site1, site2)
[1] 37.2 42.8 # Pink
[1] 27.5 32.5 # Red

# In the R code, I construct a function which calculates the confidence intervals for the purchase rate of each site at a 80% significance level.  In this example, the purchase rate for the pink background is significantly higher than the purchase rate for the black background.

Source 2:

# First collection# First collection
control_1 <- rbinom(20, 1, 0.5)
treatment_1 <- rbinom(20, 1, 0.3)

# Learn more about probability functions in the online course Statistics with R – Advanced Level. In this course you will learn how to work with different binomial and logistic regression techniques,  know how to compare regression models and choose the right fit, and much more.

# First Analysis

test1 <- bayesTest (treatment_1, control_1, distribution = “bernoulli”, priors = c(“alpha” = 10, “beta” = 10))
print(test1)
summary(test1)
plot(test1)

 

# The treatment posterior distribution is in red and the control posterior is in green. After 40 observations in total, the posteriors have started to separate, and the probability that the treatment is less than the control is approaching 95 percent.

# Let’s simulate 20 more observations for each group and compare.

# Second Collection
control_2 <- rbind(control_1, rbinom(20, 1, 0.5))
treatment_2 <- rbind(treatment_1, rbinom(20, 1, 0.3))

Link: https://www.r-bloggers.com/ab-testing-in-r-%E2%80%93-part-1/

# Second Analysis
test2 <- bayesTest(treatment_2, control_2, distribution = “bernoulli”, priors = c(“alpha” = 10, “beta” = 10))
print(test2)
summary(test2)
plot(test2)

# We can see that with the additional 40 observations, the distributions have separated more, and the probability that the treatment is less than the control is 98 percent.

Link:  https://www.r-bloggers.com/bayesian-ab-testing-made-easy/

 

 

Naive Bayesian Classification

Using data science with A/B tests: Bayesian analysis: https://econsultancy.com/blog/65755-using-data-science-with-a-b-tests-bayesian-analysis

 

나이브 베이스 알고리즘을 이용한 분류 예

다음과 같이 5개의 학습 문서가 존재하고, 분류가 comedy(코메디 영화), action(액션 영화) 두개가 존재한다고 하자.

movie 단어(Word) 분류
1 fun,couple,love,love Comedy
2 fast,furious,shoot Action
3 Couple,fly,fast,fun,fun Comedy
4 Furious,shoot,shoot,fun Action
5 Fly,fast,shoot,love Action

이제, 어떤 문서에 “fun,furious,fast” 라는 3개의 단어만 있는 문서가 있을 때, 이 문서가 코메디인지 액션 영화인지 분리를 해보자

해당 영화가 코메디인 확률은

P(comedy|words) = P(words|comedy)*P(comedy)/P(words)  A.1 이며

액션 영화인 확률은

P(action|words) = P(words|action)*P(action)/P(words)  A.2 이 된다.

A.1 > A.2이면, 코메디 영화로 분류하고, A.1<A.2이면 액션 영화로 분리한다.

이때, A.1과 A.2는 모두 P(words)로 나누는데, 대소 값만 비교 하기 때문에 때문에 굳이 P(words)를 구해서 나눌 필요가 없이

  • P(comedy|words) = P(words|comedy)*P(comedy) <– B.1
  • P(action|words) = P(words|action)*P(action) <– B.2

만 구하면 된다. 그러면, B.1과 B.2의 값을 실제로 계산해보자

먼저 각 단어의 빈도수를 계산하면

  • Count (fast,comedy) = 1 (코메디 중, fast 라는 단어가 나오는 횟수)
  • Count(furious,comedy) = 0
  • Count(fun,comedy) = 3
  • Count(fast,action)= 2
  • Count(furious,action)=2
  • Count(furious,action) = 1

P(words|comedy)는 comedy 영화중, 지정한 단어가 나타나는 확률로, 이를 개별 단어로 합치면펼치면 P(fast,furious,fun|comedy)으로, 각 단어가 상호 연관 관계가 없이 독립적이라면 (이를 조건부 독립 conditional independence라고 한다),

P(fast|comedy)*P(furious|comedy)*P(fun|comedy)로 계산할 수 있다.

이를 계산 해보면, Comedy 영화에서 총 단어의 개수는 9번 나타났기 때문에, 모수는 9가 되고 그중에서 각 단어별로 comedy에서 나타난 횟수는 위와 같이 comedy이면서 fast인것이 1번, comedy 이면서 furious인것이 0번, comedy이면서 fun인것이 3번이 되서,

P(fast|comedy)*P(furious|comedy)*P(fun|comedy) 는 { (1/9) * (0/9) * (3/9)} 가 된다.

그리고, P(comedy)는 전체 영화 5편중에서 2편이 comedy이기 때문에, P(comedy)=2/5가 된다.

이를 B.1에 대입해보면,

P(comedy|words) = { (1/9) * (0/9) * (3/9)} * 2/5 = 0 이 된다.

같은 방식으로 B.2를 계산하면 액션 영화에서 총 단어수는 11이고, 각 단어의 발생 빈도로 계산을 해보면

P(action|words) = { (2/11) * (2/11)*(1/11) } * 3/5 = 0.0018이 된다.

결과 “P(action|words) = 0.0018” > “P(comedy|words) = 0” 이기 때문에, 해당 문서는 액션 영화로 분류가 된다.

Laplace smoothing

위의 나이브 베이스 알고리즘을 사용할때, 하나의 문제점이 학습 데이타에 없는 단어가 나올때이다. 즉 분류를 계산할 문서에 “cars”라는 단어가 있다고 하자, 이 경우 학습 데이타를 기반으로 계산하면, cars는 학습 데이타에 없었기 때문에, P(cars|comedy)와 P(cars|action) 은 모두 0이 되고, P(comedy|words)와 P(action|words)도 결과적으로 모두 0이 되기 때문에, 분류를 할 수 없다.

즉 문서가 “fun,furious,fast,cars”로 되면

P(comedy|words) = { (1/9) * (0/9) * (3/9) * (0/9:cars 단어가 나온 확률) } * 2/5 = 0

P(action|words) = { (2/11) * (2/11)*(1/11) * (0/9:cars 단어가 나온 확률)  } * 3/5 = 0

이를 해결하기 위한 방법이 Smoothing이라는 기법으로, 이 기법은 새로운 단어가 나오더라도 해당 빈도에 +1을 해줌으로써 확률이 0이 되는 것을 막는다

다시 P(x|c)를 보자

 

이렇게 계산했다.

즉, P(“fun”|comedy) = “comedy중 fun”이 나오는 횟수 / “comedy 중 나오는 모든 단어의 수 중 중복을 제거한수” 로 계산했다.

.Laplace smoothing을 하려면 빈도에 1씩을 더해 줘야 하는데, 빈도를 더해주면 공식은

 

같이 된다.

여기서 |V|는 학습 데이타에서 나오는 전체 단어의 수(유일한 개수)가 된다. 위의 학습데이타에서는 fun,couple,love,fast,furious,shoot,fly로 총 7개가 된다.

이 공식에 따라 분자와 분모에 빈도수를 더하면

P(comedy|words) = { (1+1/9+7) * (0+1/9+7) * (3+1/9+7) * (0+1/9+7:cars 단어가 나온 확률) } * 2/5 = 0.00078

P(action|words) = { (2+1/11+7) * (2+1/11+7)*(1+1/11+7) * (0+1/9+7:cars 단어가 나온 확률)  } * 3/5 = 0.0018

※ 수식에서 편의상 (2+1)/(11+7) 등으로 괄호로 묶어야 하나 2+1/11+7과 같이 표현하였으나 분자에 +1, 분모에 +7을 해서 나눈 것으로 계산하기 바란다

로 액션 영화로 판정 된다.

Log를 이용한 언더 플로우 방지

이 알고리즘에서 문제는 P(words|comedy)나 P(words|action)은 각 단어의 확률의 곱으로 계산된다는 것인데, P(fun|comedy)*P(furios|comedy)*…. 각 확률은 <1 이하기이 때문에, 항목이 많은 경우 소숫점 아래로 계속 내려가서, 구분이 어려울 정도까지 값이 작게 나올 수 있다.

이를 해결 하기 위해서 로그 (log)를 사용하면 되는데

log(a*b) = log (a) + log(b)와 같기 때문에,

  • P(comedy|words) = P(words|comedy)*P(comedy)  B.1
  • P(action|words) = P(words|action)*P(action) B.2

양쪽 공식에 모두 Log를 취하면 (어짜피 대소 크기만 비교하는 것이니까)

  • Log(P(comedy|words)) = Log(P(words|comedy)*P(comedy))<– B.1
  • Log(P(action|words)) = Log(P(words|action)*P(action))<–B.2

가 되고, B.1을 좀 더 풀어보면

Log(P(words|comedy)*P(comedy))

= Log(P(fun|comedy)*P(furios|comedy)*…*P(Comedy) )

= log(P(fun|comedy))+log(P(furios|comedy)+…+log(P(Comedy)) 로 계산할 수 있다.

위의 자료는 http://unlimitedpower.tistory.com/entry/NLP-Naive-Bayesian-Classification%EB%82%98%EC%9D%B4%EB%B8%8C-%EB%B2%A0%EC%9D%B4%EC%A6%88-%EB%B6%84%EB%A5%98 를 참고하였습니다.

출처: http://bcho.tistory.com/1010 [조대협의 블로그]

A Machine Learning-Based Strategy for the USD/CAD

A Machine Learning-Based Strategy for the USD/CAD: https://inovancetech.com/arlstrat.html

 

 

In this post we will explore a particular type of data mining called association rule learning and use it to build a basic strategy for the USD/CAD. Here are the rules we discovered in the end:

Long Rules Short Rules
    • IF The CCI20 is above -290 and below -100 AND
    • The RSI (3) is below 30 AND
    • The DEMA (10) cross is above -40 and below -20
  • THEN Long Signal
    • IF The CCI (20) is above 185 and below 325 AND
    • The RSI (3) is above 50 AND
    • The DEMA (10) cross is above 10 and below 40
  • THEN Short Signal

Stock prediction using kNN.R

# Using kNN Classifier to Predict Whether the Price of Stock Will Increase
# https://www.r-bloggers.com/using-knn-classifier-to-predict-whether-the-price-of-stock-will-increase/

require(RCurl)
sURLNative <- “https://drive.google.com/open?id=131OOLfHMdibSzzqPQ6cB52oOiyKx4Hz0EOA4UbknxHc”
sID <- “131OOLfHMdibSzzqPQ6cB52oOiyKx4Hz0EOA4UbknxHc”

# fileUrl <- “https://docs.google.com/spreadsheets/d/%s/export?format=csv”
sURL <- sprintf(“https://docs.google.com/spreadsheets/d/%s/export?format=csv”, sID)

sURI <- getURL(sURL, .opts=list(ssl.verifypeer=FALSE))
stocks <- read.csv(textConnection(sURI))
head (stocks)
library(class)
library(dplyr)
library(lubridate)
set.seed(100)
stocks$Date <- ymd(stocks$Date)
vTrainData <- year(stocks$Date) < 2014

predictors <- cbind(lag(stocks$Apple, default = 210.73),
lag(stocks$Google, default = 619.98),
lag(stocks$MSFT, default = 30.48))

prediction <- knn(predictors[vTrainData, ], predictors[!vTrainData, ], stocks$Increase[vTrainData], k = 1)

table(prediction, stocks$Increase[!vTrainData])

mean(prediction == stocks$Increase[!vTrainData])

accuracy <- rep(0, 10)
k <- 1:10
for(x in k){
prediction <- knn(predictors[vTrainData, ], predictors[!vTrainData, ],
stocks$Increase[vTrainData], k = x)
accuracy[x] <- mean(prediction == stocks$Increase[!vTrainData])
}

plot(k, accuracy, type = ‘b’)

# As we can see, the model has the highest accuracy of ~52.5% when k = 5.
# While this may not seem any good, it is often extremely hard to predict the price of stocks.
# Even the 2.5% improvement over random guessing can make a difference given the amount of money at stake.
# After all, if it was that easy to predict the prices, wouldn’t we all be trading in stocks for the easy money instead of learning these algorithms?
#
# That brings us to the end of this post. I hope you enjoyed it.
# Feel free to leave a comment or reach out to me on Twitter if you have questions!
# Note: If you are interested in learning more, I highly recommend reading Introduction to Statistical Learning (a pdf copy of the book is available for free on the website).

Machine Learning Tutorial in Python

#Source: https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
# coding: utf-8

# In[4]:

import sys
print(‘Python: {}’.format(sys.version))
# scipy
import scipy
print(‘scipy: {}’.format(scipy.__version__))
# numpy
import numpy
print(‘numpy: {}’.format(numpy.__version__))
# matplotlib
import matplotlib
print(‘matplotlib: {}’.format(matplotlib.__version__))
# pandas
import pandas
print(‘pandas: {}’.format(pandas.__version__))
# scikit-learn
import sklearn
print(‘sklearn: {}’.format(sklearn.__version__))
# In[2]:

import scipy
# In[3]:

print(‘scipy: {}’.format(scipy.__version__))
# In[5]:

print(‘numpy: {}’.format(numpy.__version__))
# In[6]:

import matplotlib
# In[7]:

print(‘matplotlib: {}’.format(matplotlib.__version__))
# In[8]:

import pandas
# In[9]:

print(‘pandas: {}’.format(pandas.__version__))
# In[11]:

import sklearn
# In[12]:

print(‘sklearn: {}’.format(sklearn.__version__))
# In[13]:

# Load libraries
import pandas
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
# In[14]:

from pandas.tools.plotting import scatter_matrix
# In[15]:

from sklearn.linear_model import LogisticRegression
# In[16]:

# Load dataset
url = “https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data”
names = [‘sepal-length’, ‘sepal-width’, ‘petal-length’, ‘petal-width’, ‘class’] dataset = pandas.read_csv(url, names=names)
# In[17]:

dataset
# In[18]:

print(dataset.describe())
# In[19]:

print(dataset.groupby(‘class’).size())
# In[20]:

dataset.plot(kind=’box’, subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
# In[ ]:

dataset.hist()
# In[21]:

x = dataset.hist()
# In[22]:

x
# In[23]:

print (x)
# In[25]:

# Split-out validation dataset
array = dataset.values
X = array[:,0:4] Y = array[:,4] validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
# In[26]:
seed = 7
scoring = ‘accuracy’
# In[27]:

# Spot Check Algorithms
models = [] models.append((‘LR’, LogisticRegression()))
models.append((‘LDA’, LinearDiscriminantAnalysis()))
models.append((‘KNN’, KNeighborsClassifier()))
models.append((‘CART’, DecisionTreeClassifier()))
models.append((‘NB’, GaussianNB()))
models.append((‘SVM’, SVC()))
# evaluate each model in turn
results = [] names = [] for name, model in models:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
results.append(cv_results)
names.append(name)
msg = “%s: %f (%f)” % (name, cv_results.mean(), cv_results.std())
print(msg)
# In[28]:

# Make predictions on validation dataset
knn = KNeighborsClassifier()
knn.fit(X_train, Y_train)
predictions = knn.predict(X_validation)
print(accuracy_score(Y_validation, predictions))
print(confusion_matrix(Y_validation, predictions))
print(classification_report(Y_validation, predictions))
# In[ ]:

KNN Algorithm

knn {class} R Documentation
k-Nearest Neighbour Classification

Description

k-nearest neighbour classification for test set from training set. For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.

Usage

knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)
Arguments

train
matrix or data frame of training set cases.

test
matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case.

cl
factor of true classifications of training set

k
number of neighbours considered.

l
minimum vote for definite decision, otherwise doubt. (More precisely, less than k-l dissenting votes are allowed, even if k is increased by ties.)

prob
If this is true, the proportion of the votes for the winning class are returned as attribute prob.

use.all
controls handling of ties. If true, all distances equal to the kth largest are included. If false, a random selection of distances equal to the kth is chosen to use exactly k neighbours.

Value

Factor of classifications of test set. doubt will be returned as NA.

References

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

See Also

knn1, knn.cv

Examples

train test cl knn(train, test, cl, k = 3, prob=TRUE)
attributes(.Last.value)

Link: https://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html

 

 

[kNN for Scanner]

# x <- read.csv(“/tmp/Rtmp9iTI0O/data1e4b29a5bc95”, row.names=1) #View(`ASX200-20171030-172116`)
# x <- read.csv(“/srv/shiny-server/temp/logs/ASX200-20171030-172116.txt”)

x <- read.csv(“/srv/shiny-server/temp/logs/FULLLISTV4.csv”)
# x2 <- (x %>% filter(MA_SIG != “No”))[, c(4, 6:15)] x2 <- (x %>% filter(MA_SIG != “No”))
x2 <- x2[, c(4, 6:15)] # Load in `ggvis`
library(ggvis)

# Iris scatter plot
# iris %>% ggvis(~Sepal.Length, ~Sepal.Width, fill = ~Species) %>% layer_points()
# x2 %>% ggvis(~RSI, ~Close, fill = ~MA_SIG) %>% layer_points()

x2$rsi_n <- normalize(x2$RSI)
x2$close_n <- normalize(x2$Close)
x2 %>% ggvis(~rsi_n, ~close_n, fill = ~MA_SIG) %>% layer_points()
# table(x2$MA_SIG)
# # Bearish Bullish No
# # 27 61 104
#
# x2$diagnosis <- factor(x2$MA_SIG, levels = c(“Bullish”, “Bearish”), labels = c(“Bullish”, “Bearish”))
#
# round(prop.table(table(x2$diagnosis)) * 100, digits = 1) # it gives the result in the percentage form rounded of to 1 decimal place( and so it’s digits = 1)
# # Bullish Bearish
# # 69.3 30.7

### Normalizing numeric data ###
normalize <- function(x) {
return ((x – min(x)) / (max(x) – min(x))) }

x_n <- as.data.frame(lapply(x2[2:11], normalize))

summary (x_n$RSI)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.3788 0.5777 0.5653 0.7544 1.0000

summary (x_n$Close)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.00000 0.01574 0.02983 0.07441 0.07359 1.00000

### Cor ###

# Overall correlation `Petal.Length` and `Petal.Width`
cor(x2$RSI, x2$Close) #same: cor(x2$rsi_n, x2$close_n)
cor(x2$RSI, x2$MarketCap)
# Return values of `iris` levels
x <- levels (x2$MA_SIG)

# Print Setosa correlation matrix
print(x[1])
cor(x2[x2$MA_SIG==x[1],2:11])

# Print Versicolor correlation matrix
print(x[2])
cor(x2[x2$MA_SIG==x[2],2:11])

# Print Virginica correlation matrix
print(x[3])
cor(x2[x2$MA_SIG==x[3],2:11])

######
# ind <- sample(2, nrow(x2), replace=TRUE, prob=c(0.67, 0.33))

# https://www.r-bloggers.com/using-knn-classifier-to-predict-whether-the-price-of-stock-will-increase/

### Creating training and test data set###
# divide the data set into 2 portions in the ratio of 65: 35 (assumed) for the training and test data set respectively.
x_train <- x_n[1:100,] # x_test <- x_n[66:100,] x_test <- x_n[101:nrow(x_n),] x_train_labels <- x2[1:100, 1] # x_test_labels <- x[66:nrow(x_n), 2] x_test_labels <- x2[101:nrow(x2), 1] #This code takes the diagnosis factor in column 1 of the x data frame and on turn creates x_train_labels and x_test_labels data frame.
### Step 3 – Training a model on data ###
# install.packages(“class”)
library(class)
x_test_pred <- knn(train = x_train, test = x_test, cl = x_train_labels, k=10)
# The value for k is generally chosen as the square root of the number of observations.

# install.packages (“gmodels”)
library (gmodels)
CrossTable(x=x_test_labels, y=x_test_pred, prop.chisq = FALSE)

# Cell Contents
# |————————-|
# | N |
# | N / Row Total |
# | N / Col Total |
# | N / Table Total |
# |————————-|
#
#
# Total Observations in Table: 752
#
#
# | x_test_pred
# x_test_labels | Bearish | Bullish | Row Total |
# ————|———–|———–|———–|
# Bearish | 93 | 112 | 205 |
# | 0.454 | 0.546 | 0.273 |
# | 0.732 | 0.179 | |
# | 0.124 | 0.149 | |
# ————|———–|———–|———–|
# Bullish | 34 | 513 | 547 |
# | 0.062 | 0.938 | 0.727 |
# | 0.268 | 0.821 | |
# | 0.045 | 0.682 | |
# ————|———–|———–|———–|
# Column Total | 127 | 625 | 752 |
# | 0.169 | 0.831 | |
# ————|———–|———–|———–|

독점공급계약시 주의사항

벤처기업이 피할 수 없는 일의 하나는 라이센스계약이나 독점공급계약을 맺는 일이다.

기술을 가진 회사는 타기업에게 그 기술에 대한 실시권(實施權)을 허락하면서 로열티 계약을 맺거나, 그 기술을 구현한 제품을 판매할 권리에 관한 일정한 계약을 맺는다.

판매망을 가진 회사는 외국의 선진기술이나 제품을 남보다 앞서 도입하면서 그 판매권을 독자적으로 확보하기 위해 독점적판매계약을 체결한다.

국내기업과 외국기업간의 가장 보편적인 계약은 독점적공급계약(Exclusive Distribution Agreement)과 기술도입계약(Technology Transfer Agreement)이다.

독점적공급계약은 판매자 입장에서 독점적판매계약이라고 부르기도 한다.
기술도입계약에 대한 설명은 다음 기회로 미루고, 이번에는 독점적판매계약을 외국기업과 체결하는 경우 주의사항을 간단히 살펴본다.

첫째, 독점판매권을 주는 자는 상대방이 충분한 판매망과 판매능력을 가지고 있는지, 받는 자는 상대방이 문제된 제품에 관한 권리를 자신에게 줄 수 있는지를 반드시 확인해야 한다.
특히 받는 자는 이러한 확인절차를 거친 후에도 계약서에 이에 관한 확인을 표시 및 보장(Representations and Warranty)사항으로 명시할 것을 상대방에게 요구하는 것이 좋다.

둘째, 계약제품은 물론, 그 제품에 대한 개량형 모델에 대하여도 독점판매권이 있는지 서로 명확히 하는 것이 좋다.
이를 불명확하게 하면 독점권을 제공한 자가 계약기간 중 경쟁자에게 유사한 제품을 공급하는 것을 막지 못할 수 있다.

셋째, 지역적 한계, 즉 판매지역(Territory)을 명시해야 한다.
한반도라고 할 경우 북한이나 제주도와 같은 섬이 포함되는지 불분명할 수 있다.
따라서 나라이름으로 정하는 것이 간편할 수 있다.

넷째, 시간적 한계인 계약기간을 명시해야 한다.
통상 1년으로 한 후 매년 특별한 사유가 없으면 자동 연장하는 경우가 많다.
아예 처음부터 2년 내지 3년으로 하는 것도 고려해 볼 만하다.

다섯째, 최소판매량 요건에 관해서 슬기롭게 대처할 필요가 있다.
협상력의 강약에 따라 최소판매량 요건을 지키지 못하는 판매권자는 독점권을 상실하거나 아예 판매권 자체를 몰수당하는 경우가 많다.
우리나라 ‘독점규제 및 공정거래법’상 최소판매량 요건을 달성치 못해 계약을 해제하는 것은 위법일 수 있다는 점을 활용해 협상하는 것이 좋다.
그러나 최소판매량 기준을 맞추지 못하면 독점판매권을 상실하고 통상판매권만 가지는 경우가 많이 있다.

여섯째, 광고 및 판촉행위를 누구의 주도하에 할지, 판매상황 및 판매고객에 대한 정보제공의무를 판매권자에게 부과할지의 여부도 중요사항이다.

일곱번째, 경쟁제품 취급금지의무, 유사제품이나 개량제품의 개발제한에 관한 규정을 두는 경우가 많다.
이 또한 우리나라 독점규제 및 공정거래법상 문제가 될 수 있으므로, 이 법조항을 지렛대로 활용하면 좋다.

마지막으로 분쟁해결을 어느 나라의 법에 의해 어느 법정에서 할 것인지도 중요하다.

요즈음은 상사중재를 많이 활용하고 있지만, 중재를 택하지 않는 경우 한국법에 의해 한국법원에서 할 것을 권하고 싶다.
<김권회 변호사 kkim@ksy.co.kr>

 

Source: http://www.dt.co.kr/contents.html?article_no=20000725184228002