دنبال کننده ها

۱۳۹۶ آبان ۱۰, چهارشنبه

How to create a relative distance matrix in r using string variables

[ad_1]



I have a dataset on npi's containing information on those npi mostly in string variables



But I've simplyfied it for this example



data <- as.data.frame(cbind(51:60, sample(1:10, 10, replace = T), sample(1:10, 10, replace = T), sample(1:10, 10, replace = T)), stringsAsfactors = F)
colnames(data) <- c("npi", "a", "b", "c")


for instance:



npi a b c
51 6 2 1
52 6 2 6
53 10 9 2
54 7 4 7
55 7 10 5
56 8 5 7
57 7 2 10
58 5 9 3
59 8 4 6
60 1 10 2


I want to create a distance matrix showing the relative distances between the different NPI's
I want them to have a large distance when they're not very similar and a small distance when they are very similar. With similar I mean they share values on variables. The variables in the real dataset are names and addresses so I cannot simply use dist().



This is how I got the distance between two npi's



length(intersect(npi1,npi2))/3 


But I don't know how to create a loop or a function to run through the whole dataset and give me a distance matrix like this:



 51 52 53 54 55 56 57 58 59 60
51 0 distance 51 to 52
52 0
53 0
54 0
55 0
56 0
57 0
58 0
59 0
60 0


Would you be able to point me in the right direction which kind of loop or function to use for this problem?




[ad_2]

لینک منبع