[ad_1]
I have a dataset on npi's containing information on those npi mostly in string variables
But I've simplyfied it for this example
data <- as.data.frame(cbind(51:60, sample(1:10, 10, replace = T), sample(1:10, 10, replace = T), sample(1:10, 10, replace = T)), stringsAsfactors = F)
colnames(data) <- c("npi", "a", "b", "c")
for instance:
npi a b c
51 6 2 1
52 6 2 6
53 10 9 2
54 7 4 7
55 7 10 5
56 8 5 7
57 7 2 10
58 5 9 3
59 8 4 6
60 1 10 2
I want to create a distance matrix showing the relative distances between the different NPI's
I want them to have a large distance when they're not very similar and a small distance when they are very similar. With similar I mean they share values on variables. The variables in the real dataset are names and addresses so I cannot simply use dist().
This is how I got the distance between two npi's
length(intersect(npi1,npi2))/3
But I don't know how to create a loop or a function to run through the whole dataset and give me a distance matrix like this:
51 52 53 54 55 56 57 58 59 60
51 0 distance 51 to 52
52 0
53 0
54 0
55 0
56 0
57 0
58 0
59 0
60 0
Would you be able to point me in the right direction which kind of loop or function to use for this problem?
[ad_2]
لینک منبع