Import des fichiers finess Etalab

Contexte

Le site data.gouv.fr publie régulièrement une extraction de la base Finess des établissements de santé sous un format bien défini.

Deux (ou trois) types d’extractions réalisées nous intéressent ici :

  • finess et pour les structures (étabissements, ou entités géographiques)
    • version avec géocodage
    • version sans géocodage
  • finess ej pour les entités juridiques

Le format de ces données est diffusé en pdf et contient toutes les informations nécessaires à l’import des données.

Nous proposons un programme qui extrait du pdf les formats des fichiers et ainsi automatise quasiment l’import des fichiers (un programme permet également de télécharger automatiquement les fichiers).

Extraire les informations de formats des données du pdf

Cette partie du Rmarkdown présente ce que réalise le programme pgm/extraire_formats_2.R pour l’extraction des établissements (structure et). Un autre programme permet de faire l’identique pour les entités juridiques pgm/extraire_formats_ej.R.

Libellés et rangs des colonnes

On identifie dans le pdf la partie liant les noms de colonnes à leurs libellés et rangs dans le fichier csv.

Partie XML hiérarchique

On identifie dans le pdf la partie contenant le fichier xsd des données.

Aperçu du fichier xsd :

library(dplyr, warn.conflicts = FALSE)
readr::read_lines('data_results/formats/format_etalabcs1100507.xsd') %>% 
  head(20) %>% 
  cat(sep = "\n")
## <?xml version="1.0" encoding="UTF-8"?>
## <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
##   <xs:simpleType name="typeNumeroFiness">
##     <xs:restriction base="xs:string">
##       <xs:pattern value="[0-9][0-9A-Z][0-9]{7}"/>
##     </xs:restriction>
##   </xs:simpleType>
##   <xs:simpleType name="typeRaisonSociale">
##     <xs:restriction base="xs:string">
##       <xs:minLength value="1"/>
##       <xs:maxLength value="38"/>
##     </xs:restriction>
##   </xs:simpleType>
##   <xs:simpleType name="typeRaisonSocialeLongue">
##     <xs:restriction base="xs:string">
##       <xs:minLength value="1"/>
##       <xs:maxLength value="60"/>
##     </xs:restriction>
##   </xs:simpleType>
##   <xs:simpleType name="typeComplementRaisonSociale">

Résultat

On donne ici un aperçu des principales variables du fichier, sous forme tabulaire :

dtttable(readr::read_csv2('data_results/formats/format_etalabcs1100507.csv', 
                          locale = readr::locale(encoding = "latin1")) %>% 
           select(section, name, type, libelle, base, pattern, rang))

Import des données

On importe la section finess, puis la section geolocalisation et on joint les deux.

C’est ce que fait le programme pgm/importer.R en ajoutant également les libellés des colonnes. Ce programme permet aussi d’intégrer de manière générique le fichier des entités juridiques.

Ajout des coordonnées au format WGS84

Les coordonnées de géolocalisations des établissements de métropôle et de la Corse sont en projection Lambert 93, mais on trouve pour les DOM-TOM d’autres types de projections :

finess_et <- readr::read_rds('data_results/etalab_cs1100507_stock_20180129-0428.rds')

knitr::kable(finess_et %>% 
  select(sourcecoordet) %>% 
  mutate(proj = stringr::str_split_fixed(sourcecoordet, ",", n = 7)[,7]) %>% 
  count(proj))
proj n
6
LAMBERT_93 93745
UTM_N20 1027
UTM_N21 11
UTM_N22 230
UTM_S38 51
UTM_S40 759

Nous allons ajouter deux colonnes contenant les coordonnées au format usuel pour des cartographies web : WGS84, ainsi tous les départements auront la même projection.

C’est ce que fait le programme pgm/ajout_coordonnees_wgs84.R.

Cartographies

Pharmacies d’officine en Île-de-France

finess_et_wgs_84 <-readr::read_rds('data_results/etalab_cs1100507_stock_20180129-0428-wgs84.rds')

pharmacies_idf <- finess_et_wgs_84 %>% 
  filter(departement %in% c('75', '77', '78', '91', '92', '93', '94', '95'),
         categetab == '620')


library(ggplot2)
ggplot() + 
  geom_point(data = pharmacies_idf, 
             aes(x = lon, y = lat,
                 color = paste(departement, libdepartement, sep = " - "))) + 
  coord_map() + 
  ggthemes::theme_map() + 
  ggthemes::scale_color_pander(name = "Département") +
  theme(legend.position = "bottom") + 
  ggtitle("Pharmacies d'officine en Île-de-France")

Établissements rattachés à des CHU/CHR et CH en métropôle

finess_et_wgs_84 <-readr::read_rds('data_results/etalab_cs1100507_stock_20180129-0428-wgs84.rds')

etabs <- finess_et_wgs_84 %>% 
  filter(categetab %in% c('101', '355'), !grepl('[A-Z]',departement))


library(ggplot2)
ggplot() + 
  geom_point(data = etabs, 
             aes(x = lon, y = lat,
                 color = paste(categetab, libcategetab, sep = " - ")), 
             alpha = 0.7, size = 2) + 
  coord_map() + 
  ggthemes::theme_map() + 
  ggthemes::scale_color_pander(name = "") +
  theme(legend.position = "bottom") + 
  ggtitle("CH et CHR/CHU en métropôle")

Établissements en Martinique

finess_et_wgs_84 <-readr::read_rds('data_results/etalab_cs1100507_stock_20180129-0428-wgs84.rds')

martinique <- finess_et_wgs_84 %>% 
  filter(departement == '9B')


library(leaflet)
leaflet() %>% 
  addTiles( attribution = "Donn&eacutees : 
<a href = 'https://www.data.gouv.fr/fr/search/?q=finess' target = '_blank'>Finess Etalab</a>") %>%
      addProviderTiles("CartoDB.Positron") %>% 
      addProviderTiles("Stamen.TonerLines", 
                       options = providerTileOptions(opacity = 0.5)) %>% 
       addProviderTiles("Stamen.Watercolor",options = providerTileOptions(opacity = 0.2)) %>% 
  addCircleMarkers(lng = martinique$lon, lat = martinique$lat, radius = 5, 
                   fill = TRUE,fillOpacity = 0.8,
                   color = '#00802B', stroke = FALSE, 
                   popup = paste0('<b>', martinique$rs, 
                                  '</b><br><em>', martinique$libcategagretab, '</em>'))

Aperçu des données

On tire un échantillon dans la table pour en montrer un aperçu (glimpse) :

Finess établissements géolocalisés

glimpse(sample_n(finess_et_wgs_84, 500), width = 200)
## Observations: 500
## Variables: 36
## $ nofinesset              <chr> "680018488", "380016881", "280003237", "850025826", "620104893", "830213047", "030782668", "130796873", "210981304", "340019280", "950786764", "870000171", "940811...
## $ nofinessej              <chr> "750721300", "380780247", "280003211", "850025818", "620110734", "830016655", "030783401", "130786049", "210984076", "340785898", "950787002", "870015336", "940806...
## $ rs                      <chr> "MAISON-RELAIS ARMEE DU SALUT", "CMP ADULTES LE BOURG D'OISANS", "SELARL PHARMACIE COLIN", "MSP DE MORTAGNE - LA GAUBRETIERE", "SESSAD  DE \"LE POURQUOI PAS?\"", "...
## $ rslongue                <chr> "", "CMP ADULTES LE BOURG D'OISANS", "SELARL PHARMACIE COLIN", "MAISON DE SANTÉ DE MORTAGNE - LA GAUBRETIERE", "SERVICE D'EDUCATION A DOMICILE INSTITUT MEDICO EDUC...
## $ complrs                 <chr> "", "UF 2216", "", "", "BERTRAND FACON", "", "", "", "ACODEGE", "", "", "", "", "", "CHAINE THERMALE DU SOLEIL", "", "", "", "", "DOMAINE DU CHATEAU CHAVAT", "", "...
## $ compldistrib            <chr> "", "HLM LES PRES DES ROCHES - BAT A", "", "", "", "", "", "", "", "ZAC DE L OVALIE", "", "", "", "FOYER SERVICES JEUNE TRAVAILLEUR", "", "17 A", "", "BIOPARC / BI...
## $ numvoie                 <chr> "45", "276", "9", "2", "12", "", "19", "112", "19", "135", "52", "8", "13", "9", "5", "17", "131", "27", "2", "5", "137", "51", "5", "", "3", "145", "", "33", "15"...
## $ typvoie                 <chr> "R", "R", "R", "R", "R", "R", "RTE", "BD", "R", "R", "R", "R", "R", "BD", "AV", "RTE", "R", "ALL", "R", "ALL", "R", "RTE", "R", "R", "AV", "R", "R", "R", "COUR", "...
## $ voie                    <chr> "BUFFON", "DE BELLEDONNE", "JULES FERRY", "DU DRILLAIS", "DU POURQUOI PAS", "DU DOCTEUR JAUFFRET", "DU STADE", "JEANNE D'ARC", "JEAN-BAPTISTE BAUDIN", "ANDRE PUIG-...
## $ compvoie                <chr> "", "", "", "", "", "", "", "B", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""...
## $ lieuditbp               <chr> "", "", "", "", "", "", "", "", "51838", "", "", "BP 7", "", "", "", "", "", "", "CS 144", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ...
## $ commune                 <chr> "224", "052", "154", "097", "498", "004", "211", "205", "231", "172", "572", "201", "037", "366", "126", "482", "350", "318", "362", "327", "120", "136", "145", "0...
## $ departement             <chr> "68", "38", "28", "85", "62", "83", "03", "13", "21", "34", "95", "87", "94", "14", "34", "67", "59", "33", "26", "33", "75", "69", "45", "57", "92", "59", "45", "...
## $ libdepartement          <chr> "HAUT-RHIN", "ISERE", "EURE-ET-LOIR", "VENDEE", "PAS-DE-CALAIS", "VAR", "ALLIER", "BOUCHES-DU-RHONE", "COTE-D'OR", "HERAULT", "VAL-D'OISE", "HAUTE-VIENNE", "VAL-DE...
## $ ligneacheminement       <chr> "68200 MULHOUSE", "38520 LE BOURG D OISANS", "28190 FONTAINE LA GUYON", "85130 LA GAUBRETIERE", "62300 LENS", "83460 LES ARCS", "03410 PREMILHAT", "13005 MARSEILLE...
## $ telephone               <chr> "0389444356", "0476802286", "0237225073", "0251651159", "0321284019", "0494852300", "0470515369", "0491482656", "0380416867", "0499549670", "0134301991", "05554333...
## $ telecopie               <chr> "", "0476110513", "0237225073", "", "0321785460", "0494852328", "0470515806", "", "0380656963", "0499549671", "0134641175", "0555433303", "0145477734", "0231620158...
## $ categetab               <chr> "258", "156", "620", "603", "182", "202", "246", "156", "344", "500", "214", "109", "202", "257", "126", "354", "194", "124", "344", "362", "354", "620", "252", "5...
## $ libcategetab            <chr> "Maisons Relais - Pensions de Famille", "Centre Médico-Psychologique (C.M.P.)", "Pharmacie d'Officine", "Maison de santé (L.6223-3)", "Service d'Éducation Spéciale...
## $ categagretab            <chr> "4607", "1111", "3201", "2103", "4106", "4401", "4302", "1111", "4504", "4401", "4601", "1107", "4401", "4602", "1205", "4605", "4104", "2206", "4504", "1109", "46...
## $ libcategagretab         <chr> "Logements en Structure Collective", "Autres Etablissements de Lutte contre les Maladies Mentales", "Commerce de Biens à Usage Médicaux", "Autres structures d'exer...
## $ siret                   <chr> "", "26380021101262", "51371008700026", "", "77563175700074", "41822085100020", "77554832400059", "26130008100559", "33369592200398", "26340028500239", "3047079790...
## $ codeape                 <chr> "", "8610Z", "", "8621Z", "8891B", "", "", "8621Z", "8899B", "8710A", "", "8610Z", "8730A", "8899B", "9604Z", "8690D", "", "8623Z", "", "", "", "", "", "8710A", ""...
## $ codemft                 <chr> "99", "03", "01", "01", "34", "08", "34", "03", "01", "45", "30", "04", "01", "01", "14", "54", "05", "36", "30", "23", "54", "01", "08", "45", "01", "34", "34", "...
## $ libmft                  <chr> "Indéterminé", "ARS établissements Publics de santé dotation globale", "Etablissement Tarif Libre", "Etablissement Tarif Libre", "ARS / DG dotation globale", "Prés...
## $ codesph                 <chr> "", "1", "", "", "", "", "", "1", "", "", "", "6", "", "", "0", "", "", "", "", "1", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "0", "", "...
## $ libsph                  <chr> "", "Etablissement public de santé", "", "", "", "", "", "Etablissement public de santé", "", "", "", "Etablissement de santé privé d'intérêt collectif", "", "", "...
## $ dateouv                 <date> 2007-09-01, 2010-11-01, 2014-04-07, 2014-09-01, 1970-09-14, 1977-06-15, 1979-02-02, 1983-05-01, 1904-04-04, 2011-05-16, 1981-12-22, 1968-06-10, 1981-04-04, 1976-1...
## $ dateautor               <date> 2007-09-01, 2010-11-01, 2013-08-02, 2013-06-13, 2017-01-03, 2017-01-03, 2017-01-03, 1983-05-01, 1904-04-04, 2009-01-13, 2017-01-03, 1964-11-02, 2017-01-03, 1976-1...
## $ datemaj_structureet     <date> 2017-11-23, 2016-09-05, 2017-11-14, 2015-01-14, 2017-04-21, 2017-04-21, 2017-04-21, 2011-04-13, 2013-02-21, 2015-11-26, 2017-04-21, 2014-06-06, 2017-04-21, 2016-1...
## $ coordxet                <dbl> 1024361.6, 938532.9, 575818.2, 390876.4, 687601.4, 981531.4, 664513.3, 895431.1, 854645.4, 768104.9, 636836.8, 554990.2, 652483.0, 497316.6, 706592.0, 1047754.3, 7...
## $ coordyet                <dbl> 6747488, 6444595, 6820330, 6657104, 7037704, 6268844, 6579033, 6246704, 6692753, 6277304, 6884854, 6529401, 6857253, 6897701, 6277121, 6842837, 7060956, 6419243, 6...
## $ sourcecoordet           <chr> "1,ATLASANTE,100,IGN,BD_ADRESSE,V2.2,LAMBERT_93", "2,ATLASANTE,100,IGN,BD_ADRESSE,V2.2,LAMBERT_93", "2,ATLASANTE,100,IGN,BD_ADRESSE,V2.2,LAMBERT_93", "1,ATLASANTE,...
## $ datemaj_geolocalisation <date> 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-01-25, 2018-0...
## $ lat                     <dbl> 0.9727036, -1.0600067, 1.4994665, 0.3593862, 3.0152758, -2.2296470, -0.1420348, -2.3638323, 0.6227776, -2.1537426, 1.9486214, -0.4820104, 1.7581799, 2.0264342, -2....
## $ lon                     <dbl> 5.2061033, 4.5812901, 2.1498712, 0.9137244, 2.9134875, 4.8391095, 2.7620335, 4.2744428, 4.0470749, 3.4452840, 2.5651498, 2.0317211, 2.6736507, 1.6031829, 3.0430997...

Finess juridiques

juridiques <-readr::read_rds('data_results/etalab_cs1100501_stock_20180129-0431.rds')

glimpse(sample_n(juridiques, 500), width = 200)
## Observations: 500
## Variables: 22
## $ nofiness           <chr> "370012585", "920019304", "450019948", "510021157", "750018855", "920018918", "380011478", "330053844", "920019957", "130038557", "260009519", "260020565", "470016361",...
## $ rs                 <chr> "GCS IFSI RÉGION CENTRE", "PHARMACIE ALLAL-BRAMI", "SELAS BPR ANALYSES SPECIALISEES", "PHARMACIE LIOCHON", "PHARMACIE BERTHELOT - FOURREAU", "SARL DOUCE FRANCE SANTE", ...
## $ rslongue           <chr> "", "PHARMACIE ALLAL-BRAMI", "SELAS BPR ANALYSES SPECIALISEES", "PHARMACIE LIOCHON", "PHARMACIE BERTHELOT - FOURREAU", "SARL DOUCE FRANCE SANTE", "MAISON DU HANDICAP DE...
## $ complrs            <chr> "", "", "ZAC ARBORIA", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "...
## $ numvoie            <chr> "2", "31", "", "1", "28", "12", "4", "54", "16", "6", "3", "", "29", "35", "34", "", "", "16", "6", "111", "", "21", "45", "2", "28", "1328", "3", "11", "91", "32", "23...
## $ typvoie            <chr> "BD", "R", "AV", "PL", "R", "R", "AV", "R", "R", "PL", "R", "PL", "AV", "R", "R", "PL", "R", "R", "R", "R", "PL", "BD", "AV", "AV", "AV", "AV", "R", "RTE", "R", "AV", "...
## $ voie               <chr> "TONNELLÉ", "JEAN JAURES", "DES PLATANES", "SAINT TIMOTHEE", "DU FBG DU TEMPLE", "JEAN JAURES", "DOYEN LOUIS WEIL", "VALENTIN BERNARD", "PAUL VAILLANT COUTURIER", "SADI...
## $ compvoie           <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ...
## $ compldistrib       <chr> "CHRU TOURS", "", "PARC D'ACTIVITÉS ARBORIA", "", "", "", "IMMEUBLE LE PULSAR", "", "", "", "", "HOTEL D ENTREPRISES ECOSITES", "", "", "", "MAISON DE QUARTIER PORT NEU...
## $ lieuditbp          <chr> "", "", "", "", "", "CS10032", "", "", "", "", "", "", "", "", "", "", "LA ROSERAIE", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""...
## $ commune            <chr> "261", "062", "247", "454", "111", "062", "185", "067", "063", "202", "113", "125", "203", "001", "366", "300", "128", "218", "106", "209", "008", "599", "445", "001", ...
## $ ligneacheminement  <chr> "37044 TOURS CEDEX 9", "92800 PUTEAUX", "45700 PANNES", "51100 REIMS", "75011 PARIS", "92800 PUTEAUX", "38000 GRENOBLE", "33710 BOURG SUR GIRONDE", "92500 RUEIL MALMAIS...
## $ departement        <chr> "37", "92", "45", "51", "75", "92", "38", "33", "92", "13", "26", "26", "47", "47", "14", "17", "38", "42", "22", "9B", "83", "59", "64", "13", "77", "76", "42", "27", ...
## $ libdepartement     <chr> "INDRE-ET-LOIRE", "HAUTS-DE-SEINE", "LOIRET", "MARNE", "PARIS", "HAUTS-DE-SEINE", "ISERE", "GIRONDE", "HAUTS-DE-SEINE", "BOUCHES-DU-RHONE", "DROME", "DROME", "LOT-ET-GA...
## $ telephone          <chr> "", "0147783797", "0238858566", "0326852435", "", "0147757807", "0438124848", "0557683038", "0147510133", "0491914939", "0475220744", "0426521124", "0553414294", "", "0...
## $ statutjuridique    <chr> "29", "85", "91", "85", "71", "72", "28", "85", "85", "60", "60", "72", "80", "70", "72", "60", "60", "60", "60", "78", "85", "70", "72", "95", "70", "75", "61", "71", ...
## $ libstatutjuridique <chr> "Groupement de Coopération Sanitaire public", "Soc. Exercice Libéral Responsabilité Limitée (S.E.L.A.R.L.)", "Société Exercice Libéral par Actions Simplifiée (S.E.L.A.S...
## $ categetab          <chr> "", "", "699", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "699", "699", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",...
## $ libcategetab       <chr> "", "", "Entité Ayant Autorisation", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "Entité Ayant Autorisation", "Entité Ayant Autorisation...
## $ siren              <chr> "130015621", "448769612", "503881633", "329852578", "384964011", "401916564", "130001027", "", "507652873", "409427960", "779410935", "521830505", "822745857", "", "508...
## $ codeape            <chr> "8412Z", "", "", "", "", "", "", "", "", "8810A", "8810A", "", "8621Z", "", "", "", "8720A", "8810A", "8810A", "", "", "", "", "8622C", "", "8610Z", "8899B", "", "", ""...
## $ datecrea           <date> 2010-03-05, 1979-03-15, 2016-12-27, 1942-04-10, 1942-11-11, 2007-11-20, 2006-01-01, 2001-01-01, 1944-05-23, 2001-01-01, 2001-01-01, 2010-04-28, 2011-03-18, 2007-07-30,...
Logo AP-HP
Simap - DOMU

2018-04-05