您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

pandas.io.json.json_normalize与非常嵌套的json

pandas.io.json.json_normalize与非常嵌套的json

在下面的熊猫示例中,方括号是什么意思?有没有遵循[]的逻辑。 […]

result = json_normalize(data, 'counties', ['state', 'shortname',

[‘info’, ‘governor’]])

值中的每个字符串或字符串列表都是 除所选行之外要['state', 'shortname', ['info', 'governor']]包含的元素的路径。第二个参数实参(在文档示例中设置为)告诉该函数如何从输入数据结构中选择组成输出中各行的元素,并且路径会添加更多元数据,这些元数据将包含在每行中。如果可以的话,可以将它们视为数据库中的表联接。 __json_normalize()``record_path``'counties'``Meta

对于输入美国各州 文档例如一个列表两个字典,而且这两个字典有一个counties关键是引用类型的字典的另一个列表:

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...         'info': {'governor': 'Rick Scott'},
...         'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {'governor': 'John Kasich'},
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
 {'name': 'Broward', 'population': 40000},
 {'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
 {'name': 'Cuyahoga', 'population': 1337}]

它们之间有5行数据可用于输出

>>> json_normalize(data, 'counties')
         name  population
0        Dade       12345
1     Broward       40000
2  Palm Beach       60000
3      Summit        1234
4    Cuyahoga        1337

Meta然后,该参数命名位于这些列表 旁边的 一些元素,然后将这些元素counties分别合并。来自第一个data[0]字典的这些Meta元素的值('Florida', 'FL', 'Rick Scott')分别是和,来自这些字典data[1]的值分别来自于同一顶级字典('Ohio', 'OH', 'John Kasich')counties行,分别重复了3次和2次:

>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

因此,如果您为Meta参数传递一个列表,则列表中的每个元素都是单独的路径,并且每个单独的路径都标识要添加输出中的行的数据。

您的 例子JSON,只有少数嵌套列表的第一个参数提升,喜欢'counties'的例子一样。该数据结构中的唯一示例是嵌套'authors'键。您必须提取每个['_source', 'authors']路径,然后才能从父对象添加其他键以增加这些行。

然后,第二个Meta参数_id从最外面的对象中提取键,然后是嵌套['_source', 'title']['_source', 'journal']嵌套的路径。

record_path参数以authors列表为起点,如下所示:

>>> d['hits']['hits'][0]['_source']['authors']   # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
  'author_id': '780E3459',
  'author_name': 'munish puri'},
 {'affiliations': ['Punjabi University'],
  'author_id': '48D92C79',
  'author_name': 'rajesh dhaliwal'},
 {'affiliations': ['Punjabi University'],
  'author_id': '7D9BD37C',
  'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
  'author_name': 'barbara eileen ryan'}]
>>> # etc.

因此为您提供以下行:

>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
           affiliations author_id          author_name
0  [Punjabi University]  780E3459          munish puri
1  [Punjabi University]  48D92C79      rajesh dhaliwal
2  [Punjabi University]  7D9BD37C            r s singh
3                   NaN  7FF872BC  barbara eileen ryan
4                   NaN  0299B8E9     fraser j harbutt
5                   NaN  7DAB7B72   richard m freeland

然后我们可以使用第三个Meta参数来添加更多的列一样_id_source.title并且_source.journal,使用['_id', ['_source', 'journal'], ['_source', 'title']]

>>> json_normalize(
...     data['hits']['hits'],
...     ['_source', 'authors'],
...     ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
           affiliations author_id          author_name       _id   \
0  [Punjabi University]  780E3459          munish puri  7AF8EBC3  
1  [Punjabi University]  48D92C79      rajesh dhaliwal  7AF8EBC3
2  [Punjabi University]  7D9BD37C            r s singh  7AF8EBC3
3                   NaN  7FF872BC  barbara eileen ryan  7521A721
4                   NaN  0299B8E9     fraser j harbutt  7DAEB9A4
5                   NaN  7DAB7B72   richard m freeland  7B3236C5

                                     _source.journal
0  Journal of Industrial Microbiology & Biotechno...
1  Journal of Industrial Microbiology & Biotechno...
2  Journal of Industrial Microbiology & Biotechno...
3                     The American Historical Review
4                     The American Historical Review
5                     The American Historical Review

                                       _source.title  \
0  Development of a stable continuous flow immobi...
1  Development of a stable continuous flow immobi...
2  Development of a stable continuous flow immobi...
3  Feminism and the women's movement : dynamics o...
4  The iron curtain : Churchill, America, and the...
5  The Truman Doctrine and the origins of McCarth...
其他 2022/1/1 18:13:34 有727人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶