在下面的熊猫示例中,方括号是什么意思?有没有遵循[]的逻辑。 […]
result = json_normalize(data, 'counties', ['state', 'shortname',
[‘info’, ‘governor’]])
值中的每个字符串或字符串列表都是 除所选行之外要['state', 'shortname', ['info', 'governor']]
包含的元素的路径。第二个参数实参(在文档示例中设置为)告诉该函数如何从输入数据结构中选择组成输出中各行的元素,并且路径会添加更多元数据,这些元数据将包含在每行中。如果可以的话,可以将它们视为数据库中的表联接。 __json_normalize()``record_path``'counties'``Meta
对于输入的 美国各州 文档例如在一个列表两个字典,而且这两个字典有一个counties
关键是引用类型的字典的另一个列表:
>>> data = [{'state': 'Florida',
... 'shortname': 'FL',
... 'info': {'governor': 'Rick Scott'},
... 'counties': [{'name': 'Dade', 'population': 12345},
... {'name': 'Broward', 'population': 40000},
... {'name': 'Palm Beach', 'population': 60000}]},
... {'state': 'Ohio',
... 'shortname': 'OH',
... 'info': {'governor': 'John Kasich'},
... 'counties': [{'name': 'Summit', 'population': 1234},
... {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]
它们之间有5行数据可用于输出:
>>> json_normalize(data, 'counties')
name population
0 Dade 12345
1 Broward 40000
2 Palm Beach 60000
3 Summit 1234
4 Cuyahoga 1337
Meta
然后,该参数命名位于这些列表 旁边的 一些元素,然后将这些元素counties
分别合并。来自第一个data[0]
字典的这些Meta
元素的值('Florida', 'FL', 'Rick Scott')
分别是和,来自这些字典data[1]
的值分别来自于同一顶级字典('Ohio', 'OH', 'John Kasich')
的counties
行,分别重复了3次和2次:
>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
因此,如果您为Meta
参数传递一个列表,则列表中的每个元素都是单独的路径,并且每个单独的路径都标识要添加到输出中的行的数据。
在 您的 例子JSON,只有少数嵌套列表的第一个参数提升,喜欢'counties'
的例子一样。该数据结构中的唯一示例是嵌套'authors'
键。您必须提取每个['_source', 'authors']
路径,然后才能从父对象添加其他键以增加这些行。
然后,第二个Meta
参数_id
从最外面的对象中提取键,然后是嵌套['_source', 'title']
和['_source', 'journal']
嵌套的路径。
该record_path
参数以authors
列表为起点,如下所示:
>>> d['hits']['hits'][0]['_source']['authors'] # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
'author_id': '780E3459',
'author_name': 'munish puri'},
{'affiliations': ['Punjabi University'],
'author_id': '48D92C79',
'author_name': 'rajesh dhaliwal'},
{'affiliations': ['Punjabi University'],
'author_id': '7D9BD37C',
'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
'author_name': 'barbara eileen ryan'}]
>>> # etc.
因此为您提供以下行:
>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
affiliations author_id author_name
0 [Punjabi University] 780E3459 munish puri
1 [Punjabi University] 48D92C79 rajesh dhaliwal
2 [Punjabi University] 7D9BD37C r s singh
3 NaN 7FF872BC barbara eileen ryan
4 NaN 0299B8E9 fraser j harbutt
5 NaN 7DAB7B72 richard m freeland
然后我们可以使用第三个Meta
参数来添加更多的列一样_id
,_source.title
并且_source.journal
,使用['_id', ['_source', 'journal'], ['_source', 'title']]
:
>>> json_normalize(
... data['hits']['hits'],
... ['_source', 'authors'],
... ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
affiliations author_id author_name _id \
0 [Punjabi University] 780E3459 munish puri 7AF8EBC3
1 [Punjabi University] 48D92C79 rajesh dhaliwal 7AF8EBC3
2 [Punjabi University] 7D9BD37C r s singh 7AF8EBC3
3 NaN 7FF872BC barbara eileen ryan 7521A721
4 NaN 0299B8E9 fraser j harbutt 7DAEB9A4
5 NaN 7DAB7B72 richard m freeland 7B3236C5
_source.journal
0 Journal of Industrial Microbiology & Biotechno...
1 Journal of Industrial Microbiology & Biotechno...
2 Journal of Industrial Microbiology & Biotechno...
3 The American Historical Review
4 The American Historical Review
5 The American Historical Review
_source.title \
0 Development of a stable continuous flow immobi...
1 Development of a stable continuous flow immobi...
2 Development of a stable continuous flow immobi...
3 Feminism and the women's movement : dynamics o...
4 The iron curtain : Churchill, America, and the...
5 The Truman Doctrine and the origins of McCarth...